Moved to new hardware, Data drive became disabled during parity

unravelit · November 7, 2023

Hey all,

I have had unraid running for years on my trusty HP gen3 micro server. I moved the drives to a newer G7 system.

I had to do a new configuration thanks to the SAS card fooling Unraid into thinking I had different serial numbers. So I after placing the drives in their correct slots, and starting with "parity is valid" checked, it was happy and I was able to see all the files.

All seemed well until I worked out 2 of the 4 SAS channels in use are throwing CRC errors...

Originally it was only one, so I relocated that drive to a known good bay and it was happy. Now, in what was most likely a stupid thing to do, I let Unraid start a parity sync after starting as I thought it would be good way to confirm I wont be getting any more CRC errors...

So it did not go well, another drive during this sync threw so many errors that unraid has now disabled it, and the parity has paused. In looking, this drive may actually be faulty beyond my original CRC issue.

So, now I feel like I am in a precarious position, as my parity is now in an unknown state and 30% of my files are not visible (e.g. unraid is not emulating the disabled drive). That is my biggest worry - that the missing drive is not being emulated...

What should my next step be? I am happy to pull the potentially faulty drive and manually recover the files from it later, but am unsure of how to go about this while keeping the current setup with what left set up remaining running.

I have a new 10 TB drive I was going to use to replace the parity drive once things have settled down... looks like things just are not going to my plan...

stuff2-diagnostics-20231107-1410.zip

Edited November 7, 2023 by unravelit
clarified

JorgeB · November 7, 2023

Disk2 dropped offline, reboot/power cycle the server and post new diags.

unravelit · November 7, 2023

Thanks for that, it sure did. It looks like it was picked up after rebooting, and the drive looks pretty sick

stuff2-diagnostics-20231107-2114.zip

JorgeB · November 7, 2023

Disk looks fine, there are some UDMA CRC errors suggesting a cables (or controller) problem, replace cables, do a new config and try to re-sync parity, if it fails again replace the controller.

unravelit · November 7, 2023

I did that, and it does not look promising...

Unraid now thinks the disk has no file system

I paused the parity sync

stuff2-diagnostics-20231107-2232.zip

Edited November 7, 2023 by unravelit
clarity

JorgeB · November 7, 2023

Problem now is with a different disk, cancel the parity sync, stop the array, click on disk3, change the filesystem from auto to xfs, then check filesystem on that disk, run it without -n

unravelit · November 7, 2023

I really appreciate your time with this. Running the check via the GUI gave this message...

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

Edited November 7, 2023 by unravelit

JorgeB · November 7, 2023

Use -L

unravelit · November 7, 2023

OK, making progress, -L completed:

Phase 1 - find and verify superblock...

Phase 2 - using internal log - zero log...

ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used.

- scan filesystem freespace and inode maps...

clearing needsrepair flag and regenerating metadata

- found root inode chunk

Phase 3 - for each AG... - scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

Phase 5 - rebuild AG headers and trees...

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- traversal finished ...

- moving disconnected inodes to lost+found ...

Phase 7 - verify and correct link counts...

Maximum metadata LSN (67:3647140) is ahead of log (1:2).

Format log to cycle 70.

done

I have not brought the array out of maintenance mode yet.

JorgeB · November 7, 2023

Start in normal mode now, the disk should mount, check contents and look for a lost+found folder.

unravelit · November 7, 2023

OK, files are there, and there is no lost+found folder

Looking better!

I paused the parity-sync while checking for the lost+found folder. Is it OK to resume it and now let it do it's thing?

Edited November 7, 2023 by unravelit

itimpi · November 7, 2023

3 minutes ago, unravelit said:

OK, files are there, and there is no lost+found folder

Looking better!

No lost+found folder is always a good sign!

JorgeB · November 7, 2023

40 minutes ago, unravelit said:

Is it OK to resume it and now let it do it's thing?

Yep.

unravelit · November 8, 2023

Thanks again for your help, you guided me through to a working solution. The parity sync completed a few hours ago, and aside from a handful of errors on one disk, all is well.

That disk that had a few errors is well overdue to be replaced anyway as it has been online for over 7 years (I only realised when checking it's SMART info!). Now things have settled I can now work on replacing the parity drive with a newer, larger drive, and use the "old" parity drive to replace the very old drive...

I really appreciate your patience and help.

Cheers!

Moved to new hardware, Data drive became disabled during parity

Recommended Posts

unravelit

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

itimpi

Link to comment

JorgeB

Link to comment

unravelit

Link to comment

Join the conversation