ratmice Posted June 25, 2020 Posted June 25, 2020 Unraid - 6.7.2 So I woke up this morning to an array that has a disabled disk (single parity system) and seven disks all with millions of read errors. The array isn't mounted, and is unreachable from the network. In the shares pane only the disk shares are showing up, none of the other shares. I pulled a diagnostic report (attached below), and now wonder what the safe thing is to do. I did start a read test as indicated on the main page, but paused it almost immediately, not knowing if it would screw things up. Last night I rebooted the server, as I was having some trouble with the TV system, as well. All seemed OK, and Plex was able to rescan the TV shows to find some new items that had been added. Watched a couple of episodes, and went to bed. My inclination is that I need to shut it down and check cabling, reseat controllers, etc... as that seems the most likely cause for millions of read errors all of a sudden, but I don't want to do anything that might compromise the system. I need to figure out if all the disks are on the same controller, but need to shut it down to get at the disks to see. In poking through the forums, this morning, I see that Marvell controllers can be an issue, so I assume it's due to one of those. The trouble appears to start around 5:59 in the log, and there are indications that the controller is the issue. I am woefully deficient at understanding these logs, however. This server has been running faithfully for many years, with the current HW, just FYI. Also, I am assuming that the system disabled that one particular disk just because it can only do one with single parity, and it randomly chose it when the array crapped out. Additionally, the system log tool has only one entry: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134115360 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20 Any help in how to proceed would be greatly appreciated, I am not very knowledgable about the deep inner working of the UnRAID system, and Linux, in general, but I am a quick learner. Hopefully some of you forum denizens will be able to help me out and point me in the right direction. Thanks for listening. tower-diagnostics-20200625-1429.zip Quote
JorgeB Posted June 25, 2020 Posted June 25, 2020 Looks like the one of the typical SASLP problems, reboot and post new diags so we can check the SMART reports. Quote
ratmice Posted June 25, 2020 Author Posted June 25, 2020 (edited) Here are the post-reboot diagnostics. Also the disabled dis has a note that it in "unmountable: no file system". Not sure if that is SOP for disabled disks, or not. tower-diagnostics-20200625-1540.zip Edited June 25, 2020 by ratmice Quote
JorgeB Posted June 25, 2020 Posted June 25, 2020 Disk17 is really failing, still with a good controller it would just get disable instead of bringing all the other disks down with it, you should replace them with and LSI controllers when possible. 35 minutes ago, ratmice said: Also the disabled dis has a note that it in "unmountable: no file system". Check filesystem on disk17 and if all OK then replace it. https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote
ratmice Posted June 25, 2020 Author Posted June 25, 2020 Thanks for the prompt reply. As always, Johnny, you are a superb asset to the forums. I am going to replace the controllers ASAP. I currently have a SuperMicro X8SIL-F motherboard. looks like I'm limited to x8 PCI-e cards (but it seems the SASLP are x4 cards). Any recommendations for direct replacements for the SASLP controllers? seems like: LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA might be a reasonable replacement, I'm just a bit fuzzy on the bus/lane deal. Are there more stable, proven replacements that have the SFF-8087 SAS connector so I can just plug and play? Quote
JorgeB Posted June 25, 2020 Posted June 25, 2020 3 minutes ago, ratmice said: LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA That's fine, they are x8, and they are plug'n'play and use the sames cables. Quote
ratmice Posted June 25, 2020 Author Posted June 25, 2020 (edited) So, I attempted to run xfs_repair and got this output, seems like the disk is really borked. However is there a way to attempt mounting a single disk while in maintenance mode, or is starting the array and having it choke on this disk enough to just jump to ignoring the log while repairing? I am too *nix illiterate to know this. root@Tower:~# xfs_repair -v /dev/md17 Phase 1 - find and verify superblock... - block cache size set to 349728 entries Phase 2 - using internal log - zero log... zero_log: head block 449629 tail block 449625 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Also, 2 new controllers on the way. If this attempt to repair disk17 is unsuccessful, would the best course of action be to shut down the array, install the new controllers, and then try to rebuild disk 17 to a new drive? or would it be OK to do that now? Edited June 25, 2020 by ratmice Quote
JorgeB Posted June 25, 2020 Posted June 25, 2020 You need to use -L, usually it's mostly OK. You can still rebuild with current controllers, better than having the array unprotected, but if you can leave the server off then wait for the new controllers. Quote
ratmice Posted June 25, 2020 Author Posted June 25, 2020 That's what I thought, i can wait a few days for the controllers to arrive. Here goes nothing Thanks again. Quote
ratmice Posted June 25, 2020 Author Posted June 25, 2020 So, one last question, now that the repair has proceeded, and lots of items placed in Lost + Found, indicating, I think, that there was a lot of corruption, should i bother to try mounting it? or, will that screw up things when I go and try to rebuild the disk. My thinking here is that if it is actually mountable, then unraid will think that it's current state is what it's supposed to be and adjust parity accordingly, thus giving me a screwed up rebuild. As it is it appears that the emulated disk works OK. Quote
JorgeB Posted June 26, 2020 Posted June 26, 2020 There's no risk trying to mount the emulated disk, and whatever is there is what there will be on the rebuilt disk, parity was already updated during the filesystem check to reflect that. Quote
ratmice Posted June 26, 2020 Author Posted June 26, 2020 OK, thanks. I think I will just not tempt fate and leave everything alone until the new controllers get here. That should be an adventure, as well. O_o Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.