fysmd Posted December 1, 2018 Share Posted December 1, 2018 Unraid Pro v6.6.3 - diag attached. I noticed a drive was emulated earlier today (only after the Mrs said that Plex was misbehaving! - must check my notification setting!!) The GUI says that content is emulated but actually, it's not there. The user share is available but the files which were on the faulty drive are not there. At first, navigating to the parent directory in a shell errored: an@Server:/mnt/user/TV$ cd The\ Deuce/ ian@Server:/mnt/user/TV/The Deuce$ ls -la /bin/ls: reading directory '.': Input/output error total 0 ian@Server:/mnt/user/TV/The Deuce$ So I stopped the array and restarted, now I can navigate to the directory, but not the sub directory which was mounted on the failed drive (split season directories) The faulty drive (disk17 appears in unassigned drives section now (ST3000DM001-1CH166_W1F47CAF) How should I proceed? server-diagnostics-20181201-1739.zip Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 (edited) Run file system checks against disk 17 https://wiki.unraid.net/Check_Disk_Filesystems You can do this either before or after rebuilding the drive. Edited December 1, 2018 by Squid Quote Link to comment
fysmd Posted December 1, 2018 Author Share Posted December 1, 2018 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 Remove the -n from the options Quote Link to comment
fysmd Posted December 1, 2018 Author Share Posted December 1, 2018 (edited) Thank you so much for helping so quickly! Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. So, do I just mount in unassigned drives, or start the array? Edited December 1, 2018 by fysmd Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 If it still comes up as unmountable after starting the array, then run it again with the -L option. Usually no data loss happens. Quote Link to comment
fysmd Posted December 1, 2018 Author Share Posted December 1, 2018 10 minutes ago, Squid said: If it still comes up as unmountable after starting the array, then run it again with the -L option. Usually no data loss happens. OK, did not mount when I restarted the array so ran with -L: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:138984) is ahead of log (1:2). Format log to cycle 4. done when I try to start the array again, disk17 says unassigned... I mean with the array stopped, there is no disk assignment in disk17 slot. The drive does appear in the drop down but I guess if I reassign it here it'll erase / overwrite it? Do I need to force it back into the array or something? Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 What have you been running the commands against? /dev/md17 or /dev/sdXX Quote Link to comment
fysmd Posted December 1, 2018 Author Share Posted December 1, 2018 (edited) Errr . I just clicked on disk17 in the GUI, I dont see any reference to which device it's working on.. Edited December 1, 2018 by fysmd Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 (edited) But its showing as "Unassigned" now and not "Disabled" Post a screen shot. Are the files now available when starting the array? Edited December 1, 2018 by Squid Quote Link to comment
fysmd Posted December 1, 2018 Author Share Posted December 1, 2018 It just says not installed when started, unassigned when not Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 If you stop the array, and reassign the drive into 17, it should begin to rebuild. Quote Link to comment
fysmd Posted December 2, 2018 Author Share Posted December 2, 2018 Half way through rebuilding and the contents do seem to be emulated again😄 (I was worried there!). Thank you SOOOOO much for the help, I think without this assistance I would have removed the drive from the array, rebuilt parity, mouted the drive externally and copied any working content back to the array. I had a power incident at home a couple of weeks ago, while the machine stayed up, one drive stopped working completely and I suspect another had a similar issue to this one (it mounted externally and worked, then passed a preclear without issue!) Time I think to invest in fresh batteries for the UPS which isn't on at the moment!! Quote Link to comment
fysmd Posted December 5, 2018 Author Share Posted December 5, 2018 O-K.. I had an almost identical issue again a couple fo days ago and followed the process described above (Maintenance mode, disk checks etc, rebuild array after remounting drive and array returned to health after a parity rebuild. Today, I have the same symptoms again but with a different drive again - diag attached prior to doing anything:. I notice that two drives from my array appear in my unassigned drives section on the main screen. One of them is the drive reporting failed and the other claims to still be healthy in the array, screenshot below: Array still started but not happy at all, lots of data missing Am I doing something really wrong somewhere, been running Unraid for a very long time without issues at all, seems to be all wrong right now! Please help! server-diagnostics-20181205-2219.zip Quote Link to comment
JorgeB Posted December 5, 2018 Share Posted December 5, 2018 You are using Marvell controllers with ports multipliers, Marvell controllers by themselves are not recommended and are usually trouble, connected to a port multiplier you are asking for trouble, replaced them with LSI HBAs and your troubles should end. Quote Link to comment
fysmd Posted December 6, 2018 Author Share Posted December 6, 2018 This sounds odd to me. I do not have any stand-alone port multipliers, could they be a part of the card I'm using?? Also, I have been running this config for many years now and I have only ever had similar issues when I was mixing REISERFS and XFS, I migrated all data without issue and it's been stable since then (until now!). I have upgraded unraid - is it possible (advisable?) to downgrade back to a version which did not exhibit these errors? I have also taken the plunge and gone for one of these puppies: https://www.scan.co.uk/products/24-port-broadcom-sas-9305-24i-host-bus-adaptor-internal-12gb-s-sas-pcie-30 it's on the recommended HW list so I ought to be golden with this fella - will I?? I have another question regarding rebuilding now. I have disabled all software which might be trying to write changes to my unraid array but with one drive (allegedly) missing and and another in a very off state, how should I proceed to get back healthy? Obviously if both drive refuse to get recognised i cant rebuild the data from parity :( Quote Link to comment
JorgeB Posted December 6, 2018 Share Posted December 6, 2018 1 minute ago, fysmd said: I do not have any stand-alone port multipliers, could they be a part of the card I'm using?? The two Marvell controllers you have (they appear to be 8 ports each) have a builtin port-multiplier, and while one of them seems to be behaving for now the other one filled the log with ATA errors, timeouts, disconnects, reconnects and such. 3 minutes ago, fysmd said: I have upgraded unraid - is it possible (advisable?) to downgrade back to a version which did not exhibit these errors? If they were working better then it might be a good idea, at least until you replace them. 3 minutes ago, fysmd said: have also taken the plunge and gone for one of these puppies: https://www.scan.co.uk/products/24-port-broadcom-sas-9305-24i-host-bus-adaptor-internal-12gb-s-sas-pcie-30 That's a good option. 4 minutes ago, fysmd said: I have another question regarding rebuilding now. I have disabled all software which might be trying to write changes to my unraid array but with one drive (allegedly) missing and and another in a very off state, how should I proceed to get back healthy? Only one disk is disable, the others just dropped offline since they are on the same controller, if you reboot they should come back online, though I would wait for the LSI to do the rebuild. Quote Link to comment
fysmd Posted December 6, 2018 Author Share Posted December 6, 2018 OK, well my new toy will be with me tomorrow so I'll take it from there. Can I just check the approach when I get the new controller, after reconnecting everything should I expect the one disabled drive to still be disabled? is there a way to force it back into life or is it safer (better) to just all it to allow the array to rebuild on the same disk again? Quote Link to comment
JorgeB Posted December 6, 2018 Share Posted December 6, 2018 21 minutes ago, fysmd said: after reconnecting everything should I expect the one disabled drive to still be disabled? Yes 22 minutes ago, fysmd said: is there a way to force it back into life or is it safer (better) to just all it to allow the array to rebuild on the same disk again? If you're sure no data was written to that disk after it got disabled you can do a new config and resync parity instead. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.