Jump to content

Parity disk not recognized after swapping


Kismet

Recommended Posts

I just completed a successful parity copy from my old disk to the new one.  Then I shutdown the server and replaced the former parity disk with a new drive to do a data restore.  However when I did that the array was looking for the old parity drive in the array and considered the new parity drive invalid (still recognizes the drive as a valid XFS disk with data, just not at as the parity disk).  The only reason I can see for this is that it seems like I have to put the old parity drive in, start the array (assuming I don't have to do another copy), let the system do a data rebuild on the former parity disk, then remove the parity disk and put in the new drive to do a new data rebuild.  Is that really the case or is there some other reason my new parity disk wouldn't be recognized?

 

Edit:

So I went in and set everything back to as it was when I completed the parity copy, and now Unraid still won't recognize my new disk as a partiy disk and wants me to do another 9 hour copy all over again.

 

Second edit:

This is an edit born of frustration because it really looks like the system completed the copy and the array, as listed in the UI, was showing the properly configured array (post copy) but Unraid didn't actually save the new configuration while leaving me with every indication that it had done so.  And now because the configuration on the USB stick wasn't updated I've got the redo the whole process over just so I can force the UI to save the new configuration by hitting the start array button, despite no indication that would be necessary.  Hypothetically I could just rebuild the array but if I do that then I lose the parity information and because I'm replacing a data disk that went bad, I'd lose the ability to restore it if I rebuild the array.

Link to comment
1 minute ago, johnnie.black said:

If you havent's reboot yet post your diagnostics: Tools -> Diagnostics

 

 

I have rebooted, although I can still give you the diagnostics if you'd like, just tell me what configuration of drives you'd like me to show.  I don't think they'll tell you anything though, I can read and mount all the drives just fine, and they show up in the Unraid GUI without any problem.  It's just after the reboot the array wants to see the old drive as the parity disk and not the new one.

Link to comment
22 minutes ago, johnnie.black said:

The diagnostics that includes the syslog showing the parity swap.

 

Unless there is a archive somewhere I can't find, I don't have those.  The diagnostics tool only gives me the syslog records back to the last startup.

 

Edit:

Since the day is over for me I'm just going to kick off a copy again and see what happens in the morning.

 

polaris-diagnostics-20171228-2353.zip

Link to comment
9 hours ago, johnnie.black said:

The 3TB WD which I assume is your old parity has double digit raw read errors, those are never a good sign on WD drives especially when above single digits.

 

That's why I rebooted to install a new drive instead of doing a data rebuild on the old parity drive.

 

So the second copy finished a few minutes ago.  I pulled a diagnostic package and then, because I'm a glutton for punishment and I wanted to check and see if I was right, I started the array in maintenance mode then shutdown, inserted the new disk, and everything is working as expected.  The new parity disk was recognized as my array's parity disk and the system is currently doing a data rebuild on the new data disk.  Which is itself extremely frustrating, a little warning or notice that the UI is lying to you and you have to start the array to finalize the configuration would have been nice.  Even when you start the copy event the text describe what you're doing just says you "may" start the array afterwards like it doesn't matter if you do or not.  I know all it cost me was a little bit of wear and an extra 9 hours, but the level of frustration and amount of expletives I want to actually put in this post over a lack of such a simple and basic UI tweak is seriously making me consider whether I want to continue using Unraid.

 

 

polaris-diagnostics-20171229-0952.zip

Link to comment
7 hours ago, johnnie.black said:

Yeah, that "may" is probably not the best wording, but after the copy finishes you only get the option to start the array to begin the rebuild, first time I recall someone having problems with the procedure description.

 

So now I have a whole new problem.  The data rebuild on my new drive just completed so I stopped the array so I could power down the system and replace the next drive since I'm replacing them all.  Except when I powered down the array the UI now shows the array wants to see the old data drive instead of the new one which just got rebuilt and appears as if it won't let me start the array without kicking off another rebuild.

 

screenshot-192_168_88.10-2017-12-29-18-32-33.png.b0c5aef4b238de0766f3fc3815379ab9.png

polaris-diagnostics-20171229-1832.zip

 

Edit:

Glorious, just glorious.  I figured I should be able to just restart the array and do the data rebuild.  After all, the data rebuild completed successfully according to the UI and syslog, therefore it should be able to quickly validate that the data is there and not need to copy it all over again.  No, no check at all, it's redoing the whole  write all over again.  At least when I was doing the parity swap you could say I executed the procedure from the manual incorrectly because I didn't complete the data rebuild on the old parity drive which is in the process.  Here I followed everything step by step with no errors anywhere and I'm still having to repeat the process and hope it just works the second time.  See you another 8 hours.

Link to comment
7 hours ago, johnnie.black said:

That's not normal at all, you may have a problem with your flash-drive, but we'd need to see the syslog before rebooting to see what happened.

 

The diagnostics I included with the screenshot was taken after the write finished and I stopped the array, no rebooting.  After that I started the array and let it run, and here is the diagnostics from after I got back to the server and the second data rebuild was complete.  After this report (polaris-diagnostics-20171230-0719) I stopped the array and now the UI is telling me everything is normal, no unexpected disk.  If it's the USB then great, finally something I can pinpoint, but this time I did absolutely nothing different and haven't rebooted since before the first rebuild and somehow it's just working after the second rebuild.

polaris-diagnostics-20171230-0719.zip

Link to comment
46 minutes ago, Kismet said:

The diagnostics I included with the screenshot was taken after the write finished and I stopped the array, no rebooting

Yes sorry about that, must have seen those before my coffee, something weird going on here, not really sure why as I never seen this before, right after boot disk1 is seen as not assigned instead of empty:

 

Dec 29 09:58:38 Polaris kernel: mdcmd (1): import 0 sdc 3907018532 0 WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ60YJ
Dec 29 09:58:38 Polaris kernel: md: import disk0: (sdc) WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ60YJ size: 3907018532
Dec 29 09:58:38 Polaris kernel: mdcmd (2): import 1
Dec 29 09:58:38 Polaris kernel: mdcmd (3): import 2 sdf 1953514552 0 WDC_WD20EARX-00PASB0_WD-WMAZA7604175
Dec 29 09:58:38 Polaris kernel: md: import disk2: (sdf) WDC_WD20EARX-00PASB0_WD-WMAZA7604175 size: 1953514552
Dec 29 09:58:38 Polaris kernel: mdcmd (4): import 3 sdd 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAD791516
Dec 29 09:58:38 Polaris kernel: md: import disk3: (sdd) WDC_WD20EARX-00PASB0_WD-WCAZAD791516 size: 1953514552 

It should look something like this:


 

Quote

 

Dec 29 09:58:38 Polaris kernel: mdcmd (1): import 0 sdc 3907018532 0 WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ60YJ
Dec 29 09:58:38 Polaris kernel: md: import disk0: (sdc) WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ60YJ size: 3907018532
Dec 29 09:58:38 Polaris kernel: mdcmd (2): import 1

Dec 29 09:58:38 Polaris kernel: md: import_slot: 1 empty
Dec 29 09:58:38 Polaris kernel: mdcmd (3): import 2 sdf 1953514552 0 WDC_WD20EARX-00PASB0_WD-WMAZA7604175
Dec 29 09:58:38 Polaris kernel: md: import disk2: (sdf) WDC_WD20EARX-00PASB0_WD-WMAZA7604175 size: 1953514552
Dec 29 09:58:38 Polaris kernel: mdcmd (4): import 3 sdd 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAD791516
Dec 29 09:58:38 Polaris kernel: md: import disk3: (sdd) WDC_WD20EARX-00PASB0_WD-WCAZAD791516 size: 1953514552

 

 

When you assign the new disk1 it's also not normal:

 

Quote

Dec 29 09:59:14 Polaris kernel: mdcmd (2): import 1 sde 3907018532 0 WDC_WD40EFRX-68N32N0_WD-WCC7K2YF4RV9
Dec 29 09:59:14 Polaris kernel: md: import disk1: (sde) WDC_WD40EFRX-68N32N0_WD-WCC7K2YF4RV9 size: 3907018532
Dec 29 09:59:14 Polaris kernel: mdcmd (3): import 2 sdf 1953514552 0 WDC_WD20EARX-00PASB0_WD-WMAZA7604175
Dec 29 09:59:14 Polaris kernel: md: import disk2: (sdf) WDC_WD20EARX-00PASB0_WD-WMAZA7604175 size: 1953514552

 

It should be like this:


 

Quote

 

Dec 29 09:59:14 Polaris kernel: mdcmd (2): import 1 sde 3907018532 0 WDC_WD40EFRX-68N32N0_WD-WCC7K2YF4RV9
Dec 29 09:59:14 Polaris kernel: md: import disk1: (sde) WDC_WD40EFRX-68N32N0_WD-WCC7K2YF4RV9 size: 3907018532

Dec 29 09:59:14 Polaris kernel: md: import_slot: 1 replaced
Dec 29 09:59:14 Polaris kernel: mdcmd (3): import 2 sdf 1953514552 0 WDC_WD20EARX-00PASB0_WD-WMAZA7604175
Dec 29 09:59:14 Polaris kernel: md: import disk2: (sdf) WDC_WD20EARX-00PASB0_WD-WMAZA7604175 size: 1953514552

 

 

And when you start the array there's a mention of a rebuild, but also a parity check:

 

Dec 29 10:00:41 Polaris kernel: mdcmd (40): check correct
Dec 29 10:00:41 Polaris kernel: md: recovery thread: recon D1 ..

Only the recon should appear, and when the rebuild finishes it's complaining that the disk is wrong only because of the size, i.e., like it didn't expand the filesystem from 3 to 4TB.

 

Like I said I never seen this before, so no idea on what's going on, something wrong with the flash drive is a possibility, though I don't see any errors about that, if you have a windows desktop/laptop try running a chkdsk on it, or whatever the equivalent on a mac.

 

P.S.: unrelated to your issues but there are also some ATA errors, probably cable/connection related, ATA2 and 4  are both 4TB disks:

 

Dec 29 11:57:32 Polaris kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x480100 action 0x6 frozen
Dec 29 11:57:32 Polaris kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Dec 29 11:57:32 Polaris kernel: ata4: SError: { UnrecovData 10B8B Handshk }
Dec 29 11:57:32 Polaris kernel: ata4.00: failed command: WRITE DMA EXT
Dec 29 11:57:32 Polaris kernel: ata4.00: cmd 35/00:40:a0:35:da/00:05:57:00:00/e0 tag 8 dma 688128 out
Dec 29 11:57:32 Polaris kernel:         res 50/00:00:a0:35:da/00:00:57:00:00/e0 Emask 0x10 (ATA bus error)
Dec 29 11:57:32 Polaris kernel: ata4.00: status: { DRDY }
Dec 29 11:57:32 Polaris kernel: ata4: hard resetting link
Dec 29 11:57:32 Polaris kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:32 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:32 Polaris kernel: ata4.00: configured for UDMA/133
Dec 29 11:57:32 Polaris kernel: ata4: EH complete
Dec 29 11:57:34 Polaris kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x480100 action 0x6 frozen
Dec 29 11:57:34 Polaris kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Dec 29 11:57:34 Polaris kernel: ata4: SError: { UnrecovData 10B8B Handshk }
Dec 29 11:57:34 Polaris kernel: ata4.00: failed command: WRITE DMA EXT
Dec 29 11:57:34 Polaris kernel: ata4.00: cmd 35/00:40:60:2a:e0/00:05:57:00:00/e0 tag 27 dma 688128 out
Dec 29 11:57:34 Polaris kernel:         res 50/00:00:60:2a:e0/00:00:57:00:00/e0 Emask 0x10 (ATA bus error)
Dec 29 11:57:34 Polaris kernel: ata4.00: status: { DRDY }
Dec 29 11:57:34 Polaris kernel: ata4: hard resetting link
Dec 29 11:57:35 Polaris kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:35 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:35 Polaris kernel: ata4.00: configured for UDMA/133
Dec 29 11:57:35 Polaris kernel: ata4: EH complete
Dec 29 11:57:51 Polaris kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Dec 29 11:57:51 Polaris kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Dec 29 11:57:51 Polaris kernel: ata2: SError: { UnrecovData 10B8B BadCRC }
Dec 29 11:57:51 Polaris kernel: ata2.00: failed command: READ DMA EXT
Dec 29 11:57:51 Polaris kernel: ata2.00: cmd 25/00:40:a0:4c:11/00:05:58:00:00/e0 tag 6 dma 688128 in
Dec 29 11:57:51 Polaris kernel:         res 50/00:00:a0:4c:11/00:00:58:00:00/e0 Emask 0x10 (ATA bus error)
Dec 29 11:57:51 Polaris kernel: ata2.00: status: { DRDY }
Dec 29 11:57:51 Polaris kernel: ata2: hard resetting link
Dec 29 11:57:51 Polaris kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 11:57:51 Polaris kernel: ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 11:57:51 Polaris kernel: ata2.00: configured for UDMA/133
Dec 29 11:57:51 Polaris kernel: ata2: EH complete
Dec 29 12:47:35 Polaris kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x480100 action 0x6 frozen
Dec 29 12:47:35 Polaris kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Dec 29 12:47:35 Polaris kernel: ata4: SError: { UnrecovData 10B8B Handshk }
Dec 29 12:47:35 Polaris kernel: ata4.00: failed command: WRITE DMA EXT
Dec 29 12:47:35 Polaris kernel: ata4.00: cmd 35/00:40:58:c2:3d/00:05:7a:00:00/e0 tag 15 dma 688128 out
Dec 29 12:47:35 Polaris kernel:         res 50/00:00:58:c2:3d/00:00:7a:00:00/e0 Emask 0x10 (ATA bus error)
Dec 29 12:47:35 Polaris kernel: ata4.00: status: { DRDY }
Dec 29 12:47:35 Polaris kernel: ata4: hard resetting link
Dec 29 12:47:35 Polaris kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 12:47:35 Polaris kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 12:47:35 Polaris kernel: ata4.00: configured for UDMA/133
Dec 29 12:47:35 Polaris kernel: ata4: EH complete

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...