unravelit Posted November 7, 2023 Share Posted November 7, 2023 (edited) Hey all, I have had unraid running for years on my trusty HP gen3 micro server. I moved the drives to a newer G7 system. I had to do a new configuration thanks to the SAS card fooling Unraid into thinking I had different serial numbers. So I after placing the drives in their correct slots, and starting with "parity is valid" checked, it was happy and I was able to see all the files. All seemed well until I worked out 2 of the 4 SAS channels in use are throwing CRC errors... Originally it was only one, so I relocated that drive to a known good bay and it was happy. Now, in what was most likely a stupid thing to do, I let Unraid start a parity sync after starting as I thought it would be good way to confirm I wont be getting any more CRC errors... So it did not go well, another drive during this sync threw so many errors that unraid has now disabled it, and the parity has paused. In looking, this drive may actually be faulty beyond my original CRC issue. So, now I feel like I am in a precarious position, as my parity is now in an unknown state and 30% of my files are not visible (e.g. unraid is not emulating the disabled drive). That is my biggest worry - that the missing drive is not being emulated... What should my next step be? I am happy to pull the potentially faulty drive and manually recover the files from it later, but am unsure of how to go about this while keeping the current setup with what left set up remaining running. I have a new 10 TB drive I was going to use to replace the parity drive once things have settled down... looks like things just are not going to my plan... stuff2-diagnostics-20231107-1410.zip Edited November 7, 2023 by unravelit clarified Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 Disk2 dropped offline, reboot/power cycle the server and post new diags. Quote Link to comment
unravelit Posted November 7, 2023 Author Share Posted November 7, 2023 Thanks for that, it sure did. It looks like it was picked up after rebooting, and the drive looks pretty sick stuff2-diagnostics-20231107-2114.zip Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 Disk looks fine, there are some UDMA CRC errors suggesting a cables (or controller) problem, replace cables, do a new config and try to re-sync parity, if it fails again replace the controller. Quote Link to comment
unravelit Posted November 7, 2023 Author Share Posted November 7, 2023 (edited) I did that, and it does not look promising... Unraid now thinks the disk has no file system I paused the parity sync stuff2-diagnostics-20231107-2232.zip Edited November 7, 2023 by unravelit clarity Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 Problem now is with a different disk, cancel the parity sync, stop the array, click on disk3, change the filesystem from auto to xfs, then check filesystem on that disk, run it without -n Quote Link to comment
unravelit Posted November 7, 2023 Author Share Posted November 7, 2023 (edited) I really appreciate your time with this. Running the check via the GUI gave this message... Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Edited November 7, 2023 by unravelit Quote Link to comment
unravelit Posted November 7, 2023 Author Share Posted November 7, 2023 OK, making progress, -L completed: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (67:3647140) is ahead of log (1:2). Format log to cycle 70. done I have not brought the array out of maintenance mode yet. Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 Start in normal mode now, the disk should mount, check contents and look for a lost+found folder. Quote Link to comment
unravelit Posted November 7, 2023 Author Share Posted November 7, 2023 (edited) OK, files are there, and there is no lost+found folder Looking better! I paused the parity-sync while checking for the lost+found folder. Is it OK to resume it and now let it do it's thing? Edited November 7, 2023 by unravelit Quote Link to comment
itimpi Posted November 7, 2023 Share Posted November 7, 2023 3 minutes ago, unravelit said: OK, files are there, and there is no lost+found folder Looking better! No lost+found folder is always a good sign! Quote Link to comment
Solution JorgeB Posted November 7, 2023 Solution Share Posted November 7, 2023 40 minutes ago, unravelit said: Is it OK to resume it and now let it do it's thing? Yep. Quote Link to comment
unravelit Posted November 8, 2023 Author Share Posted November 8, 2023 Thanks again for your help, you guided me through to a working solution. The parity sync completed a few hours ago, and aside from a handful of errors on one disk, all is well. That disk that had a few errors is well overdue to be replaced anyway as it has been online for over 7 years (I only realised when checking it's SMART info!). Now things have settled I can now work on replacing the parity drive with a newer, larger drive, and use the "old" parity drive to replace the very old drive... I really appreciate your patience and help. Cheers! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.