Tons of read errors after adding a 2nd party...


Recommended Posts

So I've been working on the server for a few weeks and finally had everything organized (I've used Unraid since the mid 2000s). I've stopped and started the array many times today for various reason. No problems. For piece of mind, I added a second 10TB party drive into an extra bay and everything looks good. No SMART errors, all green lights, etc. I hit start and all hell breaks loose. 

 

First one of the disks (Disk 11) gets immediately disabled. Oddly, this disk got disabled the last time I did a rebuild. I did a filesystem check and repair on many of these disk prior to today. Everything cleared.

 

Secondly, Disk1 started showing millions of read errors. I panicked and paused the rebuild. Made sure the connection was sturdy (didn't unplug). Resumed the rebuild and now Disk 1, 2, and 4 have  the same amount of read errors. And for some reason disk 11 and 2 shows up in the unassigned devices at the same time. When I go into the shell and try to check the disks at /etc/disk1, disk2, disk4...some of the files have ?????? for their file permissions and can't be changed, even as root (read-only filesystem). 

 

At this point, I've just paused the rebuild. I'm not sure what I should do next. If I hit resume, the errors stack up but all the lights are green besides disk 11 (which was disabled). It could complete in 24 hours but...I have a feeling I'm just going to be left with junk? But technically all the data shouldn't be touched since it should only be writing to the 2nd parity drive? No sure what to do next...

 

 

read-errors.png

shows-up-on-both.png

unraid-diagnostics-20201106-1916.zip

Link to comment

That last screenshot seems to imply that you have mounted as Unassigned a disk that was assigned as disk11. Did you do that on purpose? I guess it doesn't matter because it needs to be rebuilt anyway, but mounting an array disk outside the array makes it out-of-sync with parity.

 

Good news it looks like all assigned disks are mounted, of course disk11 is emulated but still mounted so your data looks like it should be OK. Disks 3 and 9 appear to have little if any data though. Don't know if that is as expected or not.

 

Not sure what to make of that first screenshot. If all you are building is parity2, then that disk should have a lot of writes with few on any other disks, but those numbers in the write column don't seem to agree.

 

Why are all your disks still on ReiserFS?

 

All this is almost certainly hardware related. Especially as it happened after mucking about installing a new disk for parity2. Bad connections, bad power, maybe unseated controller, etc.

 

Shutdown and check all connections, power and sata, both ends, including splitters. Check controller card seating, etc.

 

You have a lot of disks, maybe your PSU isn't adequate or working well. What is the exact model of your power supply?

Link to comment

Thank you for your quick response.

 

  • They're all ReiserFS because that's how my original array was (back in 2007?). I've been just upgrading ever since.
  • That's the thing, disk 11 wasn't in the Unassigned disk section until I paused and restarted the rebuild. I never mounted it under unassigned disks. The last time disk 11 was disabled, I formatted and rebuilt (this was a few weeks ago). Performed a filesystem check and it seemed fine.
  • I know for sure Disk 4 didn't have any filesystem errors prior to this...so not sure why all the read errors. 
  • I've been running this Silverstone 700 watt power supply for a while, it was able to keep up with 15 drives for over a decade? 
  • Hm, I've used that 640L for a while with no issues. But this might be the culprit. My gut says it's hardware as well.

I've attached an image of a directory in Disk 2. Looks like filesystem corruption...but some directories are accessible.

bash: cd: Zootopia (2016): Permission denied

 

So what would be my options at this point? Should I just let it run and hope for the best? 

 

What's kind of scary is that disk 11 got disabled so...I can't rebuild that disk AND build parity #2 at the same time. But...if I were to stop the array or cancel the parity rebuild...then parity is 100% invalid...and won't be able to rebuild disk 11....correct? Or is it not too late and the first parity disk has enough information to build disk 11, even though parity 2 is incomplete? 

 

At this point, I can't even backup the shares (Permission denied, read only). I'm wondering...if I stop the array now...will these directories suddenly become accessible or will it be more of the same story? 

 

Also, would I be able to unplug drives while the array rebuild is paused? 

 

disk2.png

Edited by jaylossless
forgot to ask
Link to comment

So I resumed and the read errors kept going up for disk 1, 2, and 4. Obviously, something seriously wrong. Any idea why this happens during rebuilds but not when the array is started? There were no hints of any issues prior to adding parity #2.

 

Am I able to pull drives out and check which is which to see if they're part of the 640L while paused? Or do I have to stop the array? 

 

If I have to stop the array...is there a difference between stopping the array and cancelling the parity? 

to-cancel-or-not.png

Link to comment
7 minutes ago, jaylossless said:

Am I able to pull drives out and check which is which to see if they're part of the 640L while paused? Or do I have to stop the array?

Some risk with zero benefit to mucking about with the hardware under power. Just shut down and try to fix things.

34 minutes ago, jaylossless said:

Silverstone 700 watt

Google gives me several different models from that search.

Link to comment

So I stopped the array and tried to arrange a few of the disks in empty physical slots (trying to rule out cables). Refreshed the page, all the drives show up and look good. When I started the array, two more disks got disabled. Obviously at this point, I'll have to recreate the array. It seems like I should've rebooted after swapping the the drive slots (I've noticed sometimes you can't trust the "main" view on Unraid at times).

 

Not sure what the problem was. Could it have been the reiserfs? I'm rebuilding now with xfs and no issues so far. I'll just copy the data back when it's finished. But I guess, after more than a decade, it was time for a fresh array.

whole thing failed.png

new-array.png

Link to comment
6 hours ago, jaylossless said:

So I stopped the array and tried to arrange a few of the disks in empty physical slots (trying to rule out cables). Refreshed the page, all the drives show up and look good. When I started the array, two more disks got disabled. Obviously at this point, I'll have to recreate the array. It seems like I should've rebooted after swapping the the drive slots (I've noticed sometimes you can't trust the "main" view on Unraid at times).

 

 

How are your drives powered? 

 

It's easy to overload the +12V Sata or Molex cables, usually they only have 3-4 connectors.

You can usually get away with ~6 disks per 'string' of connectors on a quality PSU (thick cables) but beyond that you may see power drops.

A 6 pin PCI-E connector is spec'd at 75w (6.25A) for 3 active cables, the newer PCI-E 8 pin is 150w, (12.5A) for 3 active cables. 

This gives ~2-4A per active 12V cable. 

Each HDD can be 1A continuous, spiking to 2A on a single active 12V cable so 4 drives = 4A+  the same as the higest capacity PCI cable. 

 

The strider looks to be relatively inexpensive so 15 drives could easily be connected in such a way that it overloads one of the cables, especially if the PSU is 10 years old as you suggest. Capacitors in particular lose efficiency and so provide far less tolerance of power demand spikes.

 

Good luck

 

 

Link to comment
8 hours ago, jaylossless said:

It seems like I should've rebooted after swapping the the drive slots

When you say "swapping", do you just mean changing the drive assignments? Or do you mean actually moving the physical disks around? I always use the word "slot" the same way Unraid syslog does, which is referring specifically to the assignments.

 

If you just mean the assignments, you can change them around all you want and nothing actually happens until you start the array, at which time the new assignments are recorded. No need to reboot for that.

Link to comment
4 hours ago, trurl said:

When you say "swapping", do you just mean changing the drive assignments? Or do you mean actually moving the physical disks around? I always use the word "slot" the same way Unraid syslog does, which is referring specifically to the assignments.

 

If you just mean the assignments, you can change them around all you want and nothing actually happens until you start the array, at which time the new assignments are recorded. No need to reboot for that.

Sorry swapping as in moving the drives around physically (different bays, not reassigning to different disk # slots in Unraid). I believe if I were to move drives to different slots that would invalidate the config?

Link to comment
6 hours ago, Decto said:

 

How are your drives powered? 

 

It's easy to overload the +12V Sata or Molex cables, usually they only have 3-4 connectors.

You can usually get away with ~6 disks per 'string' of connectors on a quality PSU (thick cables) but beyond that you may see power drops.

A 6 pin PCI-E connector is spec'd at 75w (6.25A) for 3 active cables, the newer PCI-E 8 pin is 150w, (12.5A) for 3 active cables. 

This gives ~2-4A per active 12V cable. 

Each HDD can be 1A continuous, spiking to 2A on a single active 12V cable so 4 drives = 4A+  the same as the higest capacity PCI cable. 

 

The strider looks to be relatively inexpensive so 15 drives could easily be connected in such a way that it overloads one of the cables, especially if the PSU is 10 years old as you suggest. Capacitors in particular lose efficiency and so provide far less tolerance of power demand spikes.

 

Good luck

 

 

That's a good point. I'll try replacing the power supply with this fresh array.

Link to comment
6 hours ago, jaylossless said:

Sorry swapping as in moving the drives around physically

Not sure why you would have asked about rebooting in that case, since you should not be doing that at all under power. If you are mucking about under power, then that might explain why you think

19 hours ago, jaylossless said:

(I've noticed sometimes you can't trust the "main" view on Unraid at times).

 

There is really nothing you can do as far as moving or replacing disks while powered up. Even if you have hot-swappable hardware.

 

If you unplug a disk with the array started, for example, it will likely get disabled and have to be rebuilt, even if you don't actually break anything.

 

And there is no point in trying to replace a disk under power either, since Unraid won't do anything with the new disk until you assign it, which can't be done with the array started. It will just disable the disk you removed.

 

So, 

On 11/7/2020 at 12:49 AM, trurl said:

Some risk with zero benefit to mucking about with the hardware under power.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.