Unraid_Noob Posted January 6, 2020 Share Posted January 6, 2020 Hello, I don't know what happened but my server crashed and when I restarted it I noticed problems to my array: - USB disk needed repair => done - 1 of the 2 parity disks is disabled - 1 of my data disk (md11) does not mount and xfs_repair drops an "error cannot find log head/tail" - 1 other data disk is disabled and content emulated. I successfully mounted it and could check the content. I ran xfs_repair and it corrected some errors. I attached my diag file to this post. Is there any chance not to loose data ? In any case, what are the steps I should follow in order to minimize damage and recover as much as possible. Thank you for your help. Sined tower-diagnostics-20200106-1803.zip Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 You should have asked for advice before doing anything if you were unsure what to do, and it seems you were. But from your description it isn't obvious you have done anything to make things worse, though that is perhaps by accident. Since you have dual parity, you should be able to rebuild both the disabled parity(1) and the disabled disk23. You mentioned running xfs repair on a data disk that wasn't disabled. You should always capture exactly the command used and the results so you can post them. You also mention running xfs repair on the disabled data disk that you somehow mounted yourself. This is a bit more complicated, and perhaps I have misunderstood you. You should never attempt to work with array disks outside of the array, or you will invalidate parity. Since the disk was being emulated and it will have to be rebuilt anyway it doesn't invalidate parity. But it was also a waste of time to repair since it is the emulated disk that should have been repaired if it needed repair. Rebuilding the disk will just put it back like it was before the repair you did outside of the array. Again, as mentioned above, you should always capture exactly the command used and the results so you can post them. That would have perhaps clarified exactly what you did in this case since I might have misunderstood. SMART reports for both disabled disks looks OK. You have too many disks for me to examine them all. Do any of your disks show SMART warnings on the Dashboard? Quote Link to comment
JorgeB Posted January 6, 2020 Share Posted January 6, 2020 45 minutes ago, Unraid_Noob said: 1 of my data disk (md11) does not mount and xfs_repair drops an "error cannot find log head/tail" Disk11 is failing, likely why xfs_repair failed, but since you already have another disabled disk not possible to emulate. If you think parity is OK and in sync we could try re-enabling it to rebuild disk11, but since the filesystem on other emulated disk was repaired it won't be 100% in sync, alternatively you could use ddrescue on disk11. Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 51 minutes ago, trurl said: SMART reports for both disabled disks looks OK. You have too many disks for me to examine them all. Do any of your disks show SMART warnings on the Dashboard? 44 minutes ago, johnnie.black said: Disk11 is failing I should have thought to at least check that one also Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 And syslog is after reboot of course, so we can't see why the disks were disabled. Connections issues would be my first guess, so those should be checked before attempting any further fixes. Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 Since you haven't visited since making that first post in the thread, I really hope you haven't been trying to fix this yourself. Quote Link to comment
JorgeB Posted January 6, 2020 Share Posted January 6, 2020 Just now, trurl said: so we can't see why the disks were disabled. Mostly likely a controller crash or power issue that caused errors in multiple disks, when this happens Unraid disables as many disks as there are parity devices, which disks get disabled is luck of the draw. Parity1 is likely still valid, parity2 will be a little different because the emulated disk repair, if it was me I would re-enable parity and the other disk to rebuild disk11 to a new disk, but this assumes all other disks and OK and parity really is in sync. 1 Quote Link to comment
Unraid_Noob Posted January 6, 2020 Author Share Posted January 6, 2020 Hi, Thank you for your replies. The actions I took were based on comments found on this forum with the exact same error message. Here are the different steps I undertook: mkdir /mnt/tmp mount /dev/md11 /mnt/tmp Quote error: mount: /mnt/tmp: can't read superblock on /dev/md11. xfs_repair -n /dev/md11 Quote Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... xfs_repair: read failed: Input/output error empty log check failed zero_log: cannot find log head/tail (xlog_find_tail=-5) - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 1 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (1:149238) is ahead of log (0:0). Would format log to cycle 4. No modify flag set, skipping filesystem flush and exiting. mount /dev/md23 /mnt/tmp No errors xfs_repair -n /dev/md23 Quote Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 4 - agno = 3 - agno = 5 - agno = 2 - agno = 7 - agno = 0 - agno = 6 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. xfs_repair /dev/md23 Quote Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 1 - agno = 3 - agno = 5 - agno = 4 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Could you please explain me what steps do I need to undertake? Thank you Sined Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 39 minutes ago, Unraid_Noob said: The actions I took were based on comments found on this forum with the exact same error message. There are a lot of threads about repairing filesystems, and any particular error message isn't the full picture. I think you must have been looking at some old threads because trying to mount to tmp isn't the usual way now. You can do the repair from the webUI now and are less likely to get the commands wrong that way. I notice that you were working with the md device though, so that seems to me as if you weren't actually doing the repair outside the array as I thought originally. So you would have been repairing the emulated disk in the case of working with a disabled disk. Let's let @johnnie.black comment on this new information. Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 12 hours ago, Unraid_Noob said: xfs_repair: read failed: Input/output error This confirms the problem with disk11, before proceeding with invalidslot command I just want to make sure actual disk23 is mounting correctly, almost certainly is but it doesn't hurt to confirm, so with the array stopped type: mkdir temp mount -o ro /dev/sdq1 /temp If you rebooted since diags check disk23 is still sdq, if it mounts correctly you can browse contents, but really not needed, we just want to make sure it's mounting, next unmount: umount /temp Report back so we can proceed with invalid slot, and don't forget you need a new disk to replace disk11. Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 Dear Jorge, Thank you for your help. I confirm I succeeded in mounting and browsing the drive. Regards Quote Link to comment
trurl Posted January 7, 2020 Share Posted January 7, 2020 Might still be good to get an answer to this: 19 hours ago, trurl said: You have too many disks for me to examine them all. Do any of your disks show SMART warnings on the Dashboard? Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 Hi, I have 7 out of 24 with CRC errors. Regards Quote Link to comment
trurl Posted January 7, 2020 Share Posted January 7, 2020 11 minutes ago, Unraid_Noob said: I have 7 out of 24 with CRC errors. Those are OK. Usually a temporary connection issue. You can acknowledge them by clicking on them and they won't warn again unless the count increases. Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 OK, so to replace disk11 wee need to enable parity1, also since disk23 looks healthy and is mounting correctly we might as well enable it also instead of keeping the repaired emulated disk, which might have some corruptions, so to do that: -Tools -> New Config -> Retain current configuration: All -> Apply -Assign any missing disk(s), including new disk11 -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 11 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk11 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Keep old disk11 intact, most data there should be recoverable with ddrescue if still needed. Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 (edited) Dear Jorge, In order to not make any mistakes: - I stop the array - I create a new config - On the main tab I change the slot 11 with a new spare disk - I check all the slot assignments - I run the provided command in a ssh session - I start the array and let the array rebuild The only part I wasn't sure is the assignment of a new spare disk in the slot 11 in place of the defective one. Could you confirm this? I will wait for your feedback before doing anything. Thank you Edited January 7, 2020 by Unraid_Noob Additionnal line Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 3 minutes ago, Unraid_Noob said: On the main tab I change the slot 11 with a new spare disk Correct, then leave the GUI on that page, the main page, GUI can't be refreshed after the invalid slot command is typed and before the array start button is clicked. Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 Ok, I have started the array, rebuild is in progress (1.5%). I will keep you posted of the progress. Thank you all for your support. Sined Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 Did disk11 mount correctly? Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 It says "Unmountble: No file system" But array is mounted and functional I got the following notification: Unraid Disk 11 error: Drive 11, drive not ready, content being reconstructed Unraid parity sync / Data rebuild: Parity Sync / Data rebuild started Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 OK, please post the current diags, or just the syslog is enough for now. Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 Here are the diags tower-diagnostics-20200107-1650.zip Quote Link to comment
JorgeB Posted January 7, 2020 Share Posted January 7, 2020 Valid filesystem is detected which is good news, but there is some corruption, should be fixable with xfs_repair, when the rebuild finishes check filesystem on disk11, with array in maintenance mode: xfs_repair -v /dev/md11 If it asks for -L use it. Quote Link to comment
Unraid_Noob Posted January 7, 2020 Author Share Posted January 7, 2020 OK will do. Thanks Quote Link to comment
Unraid_Noob Posted January 8, 2020 Author Share Posted January 8, 2020 Dear Jorge, I had trouble completing the rebuild. The server froze and I had to do a hard reset. Starting the server in maintenance mode I got the following message: "Unraid Parity sync / Data rebuild: 08-01-2020: Parity sync / Data rebuild finished (errors). Duration unavailable (no parity-check entries logged Does this mean, the rebuild completed successfully and I need to continue the procedure to correct the errors or do I need to start over from the beginning. I attached the diags in case of. Thank again for your support tower-diagnostics-20200108-1924.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.