daemian Posted October 23, 2018 Share Posted October 23, 2018 Good Morning, I had a disk fail, which i replaced. However, sometime before the parity sync/rebuild finished there was a power outage. When i booted the unraid back up - the disk assignments were lost. Through my research it sounds like its just critical that i get the parity disk correct - the other disks can be put in any location without negative consequence. So, i determined which disk was the parity, and have put it in the parity slot, and my other disks all in the disk # slots. What I am unsure of, is should I now start the array as normal, start it with "parity is already valid" selected, or do something else entirely? Whats throwing me off is that all of the disks are recognized as a "New Device" right now (blue square). I want it to rebuilt the data on the failed disk, and trust the data on the others and the parity. How do I go about this without destroying everything? Thanks! Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 25 minutes ago, daemian said: Through my research it sounds like its just critical that i get the parity disk correct Yes in a normal situation, during a rebuild it's not so simple, start by posting the release your using, as well as if you're single or dual parity, or post the diagnostics. Quote Link to comment
daemian Posted October 23, 2018 Author Share Posted October 23, 2018 Sure, version 6.5.3 single parity config. Diagnostics attached. Thanks dt-ur01-diagnostics-20181023-0850.zip Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 A couple of questions: Do you know what disk you were rebuilding, not the old disk#, the actual disk serial or current disk#? Is parity the 6TB Hitachi or one of the currently assigned data disks? Quote Link to comment
daemian Posted October 23, 2018 Author Share Posted October 23, 2018 (edited) Quote Do you know what disk you were rebuilding, not the old disk#, the actual disk serial or current disk#? I am pretty certain it is WCC4N0334109. I say that because i put all of the drives in as data drives, and strted the array (with no parity). The other 3 looked fine, but that one showed "Unmountable: No file system". I presume that would be because the power failure occurred before the parity sync finished. Quote Is parity the 6TB Hitachi or one of the currently assigned data disks? The 6TB drive is the parity. Edited October 23, 2018 by daemian Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 1 minute ago, daemian said: I am pretty certain it is WCC4N0334109. I say that because i put all of the drives in as data drives, and strted the array (with no parity). The other 3 looked fine, but that one showed "Unmountable: No file system". I presume that would be because the power failure occurred before the parity sync finished. If parity is the 6TB then that's likely it, though it would have been best if the data disks were mounted read-only, but this should still work: -Tools -> New Config -> Retain current configuration: All -> Apply -Assign any missing disk(s) like parity -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (I'll assume disk to rebuild is still disk1 if not adjust the command): mdcmd set invalidslot 1 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk1 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Quote Link to comment
daemian Posted October 23, 2018 Author Share Posted October 23, 2018 So I just want to double check, this is what the screen looks like now: I have issues this command at the CLI I have not refreshed or left the page. Now I am going to start the array, without the "Parity is already valid" selected. Is that all correct? Thank you for your help! Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 14 minutes ago, daemian said: Is that all correct? Yes Quote Link to comment
daemian Posted October 23, 2018 Author Share Posted October 23, 2018 Sorry, to be a pest, when I click start its warning me "Parity disk(s) contents will be overwritten" -your sure, right? Quote Link to comment
JorgeB Posted October 23, 2018 Share Posted October 23, 2018 Yes, it's normal, the GUI doesn't take into account the invalid slot command, as long as you typed the command correctly and didn't refresh the GUI Unraid won't touch parity and start rebuilding disk1 instead. Quote Link to comment
daemian Posted October 24, 2018 Author Share Posted October 24, 2018 OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system" Quote Link to comment
itimpi Posted October 24, 2018 Share Posted October 24, 2018 59 minutes ago, daemian said: OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system" A rebuild does not fix an “unmountable” problem as it works at the physical sector level, not the file system level. You normally need to run the file system repair tools to fix the unmountable state. Quote Link to comment
JorgeB Posted October 24, 2018 Share Posted October 24, 2018 1 hour ago, daemian said: OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system" Possibly the result of starting the disks read-write before without parity before, or worse, parity is not in sync, either way try a filesystem check: https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS or https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
JorgeB Posted October 24, 2018 Share Posted October 24, 2018 P.S. I didn't notice at first since I didn't check the complete syslog but you also have problems with your cache pool, there are read and write errors on both devices, but mainly cache1: Oct 23 08:04:35 dt-ur01 kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 166, rd 1, flush 0, corrupt 0, gen 0 Oct 23 08:04:35 dt-ur01 kernel: BTRFS info (device sdi1): bdev /dev/sdh1 errs: wr 863327568, rd 506341990, flush 65261822, corrupt 0, gen 0 These are hardware errors and with SSDs usually the result of bad cables, after replacing them run a scrub and check that all errors were corrected, though if you're using any NOCOW shares there might be some undetected corruption there. Quote Link to comment
daemian Posted October 25, 2018 Author Share Posted October 25, 2018 Thanks for pointing out the cache drive - I will check that out when i can. For the original issue, when I try to run xfs_repair I get the following error: root@dt-ur01:~# xfs_repair -v /dev/md1 Phase 1 - find and verify superblock... - block cache size set to 2290880 entries Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair. Do i try it with the -L options? It sounds like that may result in [more] data lose, but perhaps I don't really have any other option? Thank you again for all of your time and assistance. Quote Link to comment
JorgeB Posted October 25, 2018 Share Posted October 25, 2018 9 minutes ago, daemian said: Do i try it with the -L options? Yes, usually there's no data loss. Quote Link to comment
daemian Posted October 25, 2018 Author Share Posted October 25, 2018 well -L didn't get me any further root@dt-ur01:~# xfs_repair -Lv /dev/md1 Phase 1 - find and verify superblock... - block cache size set to 2290880 entries Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) Quote Link to comment
JorgeB Posted October 25, 2018 Share Posted October 25, 2018 This means the rebuilt disk has more serious corruption, either parity wasn't valid before or possibly the result of mounting the disks read-write before rebuilding, like I mentioned disks should be mounted read only since there will always be some filesystem housekeeping that won't be reflected in the existing parity, since it wasn't assigned, btrfs you'll usually never survive this, reiserfs usually survives without issues, xfs most times should survive but other times might not. Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 One thing I forgot to mention, I've seen the error above as a result of a hardware issues before, and looking at your diags I see you're using the onboard Intel controller, and that's good, but it's set to IDE mode, change it to AHCI in the bios and try xfs_repair again. Quote Link to comment
daemian Posted October 26, 2018 Author Share Posted October 26, 2018 Thanks johnnie. I believe i got the controller running in AHCI mode now instead, but the xfs_repair still fails the same. How could I confirm that it is now running in AHCI? Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 Post current diags and I can check. Quote Link to comment
daemian Posted October 26, 2018 Author Share Posted October 26, 2018 dt-ur01-diagnostics-20181026-1222.zip Quote Link to comment
JorgeB Posted October 26, 2018 Share Posted October 26, 2018 It's correct now, a couple more things you can try: upgrade to v6.6.2 since it has a newer xfs_repair release and if that still fails connect that disk to another pc, it would lose sync with parity but it might be worth a try. Quote Link to comment
daemian Posted October 26, 2018 Author Share Posted October 26, 2018 Thanks Johnnie. I upgraded to 6.5.3 and tried xfs_repair against. Still no luck. Putting this disk in another machine is not really an option for me with this one (I am remote to the site, and there are not much in the way of resources there). I think I may need to bite the bullet and just format the drive, conceding that the data from that drive is lost. Its probably not really that big of a deal. Obviously not ideal, but I don't think I have much other choice. Would I just format that drive and then run a parity check to be sure everything is ok? Quote Link to comment
JorgeB Posted October 27, 2018 Share Posted October 27, 2018 Just formatting is enough, parity will be updated, then the regular scheduled checking suffices. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.