Multiple Disks with millions of read errors.

June 25, 20206 yr

Unraid - 6.7.2

So I woke up this morning to an array that has a disabled disk (single parity system) and seven disks all with millions of read errors. The array isn't mounted, and is unreachable from the network. In the shares pane only the disk shares are showing up, none of the other shares. I pulled a diagnostic report (attached below), and now wonder what the safe thing is to do. I did start a read test as indicated on the main page, but paused it almost immediately, not knowing if it would screw things up.

Last night I rebooted the server, as I was having some trouble with the TV system, as well. All seemed OK, and Plex was able to rescan the TV shows to find some new items that had been added. Watched a couple of episodes, and went to bed.

My inclination is that I need to shut it down and check cabling, reseat controllers, etc... as that seems the most likely cause for millions of read errors all of a sudden, but I don't want to do anything that might compromise the system. I need to figure out if all the disks are on the same controller, but need to shut it down to get at the disks to see. In poking through the forums, this morning, I see that Marvell controllers can be an issue, so I assume it's due to one of those. The trouble appears to start around 5:59 in the log, and there are indications that the controller is the issue. I am woefully deficient at understanding these logs, however. This server has been running faithfully for many years, with the current HW, just FYI.

Also, I am assuming that the system disabled that one particular disk just because it can only do one with single parity, and it randomly chose it when the array crapped out.

Additionally, the system log tool has only one entry:

Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 134115360 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20

Any help in how to proceed would be greatly appreciated, I am not very knowledgable about the deep inner working of the UnRAID system, and Linux, in general, but I am a quick learner. Hopefully some of you forum denizens will be able to help me out and point me in the right direction.

Thanks for listening.

tower-diagnostics-20200625-1429.zip

Quote

June 25, 20206 yr

Community Expert

Looks like the one of the typical SASLP problems, reboot and post new diags so we can check the SMART reports.

Quote

June 25, 20206 yr

Author

Here are the post-reboot diagnostics. Also the disabled dis has a note that it in "unmountable: no file system". Not sure if that is SOP for disabled disks, or not.

tower-diagnostics-20200625-1540.zip

Edited June 25, 20206 yr by ratmice

Quote

June 25, 20206 yr

Community Expert

Disk17 is really failing, still with a good controller it would just get disable instead of bringing all the other disks down with it, you should replace them with and LSI controllers when possible.

35 minutes ago, ratmice said:

Also the disabled dis has a note that it in "unmountable: no file system".

Check filesystem on disk17 and if all OK then replace it.

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Quote

June 25, 20206 yr

Author

Thanks for the prompt reply. As always, Johnny, you are a superb asset to the forums.

I am going to replace the controllers ASAP. I currently have a SuperMicro X8SIL-F motherboard. looks like I'm limited to x8 PCI-e cards (but it seems the SASLP are x4 cards). Any recommendations for direct replacements for the SASLP controllers? seems like:

LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA

might be a reasonable replacement, I'm just a bit fuzzy on the bus/lane deal. Are there more stable, proven replacements that have the SFF-8087 SAS connector so I can just plug and play?

Quote

June 25, 20206 yr

Community Expert

3 minutes ago, ratmice said:

LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA

That's fine, they are x8, and they are plug'n'play and use the sames cables.

Quote

June 25, 20206 yr

Author

Thanks again, have a great day.

Quote

June 25, 20206 yr

Author

So, I attempted to run xfs_repair and got this output, seems like the disk is really borked. However is there a way to attempt mounting a single disk while in maintenance mode, or is starting the array and having it choke on this disk enough to just jump to ignoring the log while repairing? I am too *nix illiterate to know this.

root@Tower:~# xfs_repair -v /dev/md17
Phase 1 - find and verify superblock...
        - block cache size set to 349728 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 449629 tail block 449625
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Also, 2 new controllers on the way. If this attempt to repair disk17 is unsuccessful, would the best course of action be to shut down the array, install the new controllers, and then try to rebuild disk 17 to a new drive? or would it be OK to do that now?

Edited June 25, 20206 yr by ratmice

Quote

June 25, 20206 yr

Community Expert

You need to use -L, usually it's mostly OK.

You can still rebuild with current controllers, better than having the array unprotected, but if you can leave the server off then wait for the new controllers.

Quote

June 25, 20206 yr

Author

That's what I thought, i can wait a few days for the controllers to arrive. Here goes nothing

Thanks again.

Quote

June 25, 20206 yr

Author

So, one last question, now that the repair has proceeded, and lots of items placed in Lost + Found, indicating, I think, that there was a lot of corruption, should i bother to try mounting it? or, will that screw up things when I go and try to rebuild the disk. My thinking here is that if it is actually mountable, then unraid will think that it's current state is what it's supposed to be and adjust parity accordingly, thus giving me a screwed up rebuild. As it is it appears that the emulated disk works OK.

Quote

June 26, 20206 yr

Community Expert

There's no risk trying to mount the emulated disk, and whatever is there is what there will be on the rebuilt disk, parity was already updated during the filesystem check to reflect that.

Quote

June 26, 20206 yr

Author

OK, thanks. I think I will just not tempt fate and leave everything alone until the new controllers get here. That should be an adventure, as well. O_o

Quote

Multiple Disks with millions of read errors.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)