drive errored, previous printouts list it as both assigned and unassigned under 6.9.2, best way to get array clean

perfessor101 · August 26, 2021

Hello,

Unraid 6.9.2

Array drive is listed in both array and as unassigned

(didn't notice before attempted file transfers on another drive to this one via unbalance)

array drive had a read error "both locations" drive not responding

now array drive disabled

have replacement drive precleared

was going to replace 14 ... now replacing 16?

was going to move files from drive 14 before rebuilding drive 14 with new drive

drive 16 had 8 read errors then disabled

want to know safest way to get array back up with good parity

if having drives listed as both array and unassigned has happened to others recently

Thanks for your time

Bobby

storage-diagnostics-20210826-1033.zip

JorgeB · August 26, 2021

Syslog is missing a lot of time due to spam, but when this happens it means the disk dropped offline then reconnected with a different identifier, SMART for the disk looks OK, if the emulated disk is mounting and contents look correct you can rebuild on top, good idea to replace/swap cables/slot first to rule that out if it happens again to the same disk.

perfessor101 · August 26, 2021

I've shutdown, reseated all the drives in their hotswaps, rebooting now.

is there any spam I can cut down on?

JorgeB · August 26, 2021

Nginx was doing the log spam:

Aug 24 02:30:14 storage nginx: 2021/08/24 02:30:14 [alert] 18120#18120: worker process 2628 exited on signal 6
Aug 24 02:30:16 storage nginx: 2021/08/24 02:30:16 [alert] 18120#18120: worker process 2633 exited on signal 6
Aug 24 02:30:18 storage nginx: 2021/08/24 02:30:18 [alert] 18120#18120: worker process 2669 exited on signal 6
Aug 24 02:30:20 storage nginx: 2021/08/24 02:30:20 [alert] 18120#18120: worker process 2676 exited on signal 6

This is usually GUI related, sometimes caused by having multiple browser windows opened on the GUI, see if it happens again after this reboot.

perfessor101 · August 26, 2021

i finally removed my aoc-sas2lp-mv8 cards because of the warnings ... this is when all the unreadable drives started ... think I'll remove the SAS 9201-16i and see if works stable after with the marvel chipsets

perfessor101 · August 26, 2021

drive is rebuilding / emulated and is saying unmountable at 11%

is it worth continuing ?

Edited August 26, 2021 by perfessor101
question

trurl · August 26, 2021

Post new diagnostics.

perfessor101 · August 26, 2021

latest diagnostics.

lots of problems since switching from aoc-saslp-mv8 and saslp2-mv8 to sas 9201-16i in IT mode

lost three drives in a month

last rebuild two weeks ago "successful" as far as I know

this rebuild I'm hoping is still ok

i just don’t want to waste 18 hours to get an unmountable drive

storage-diagnostics-20210826-1646.zip

Edited August 27, 2021 by perfessor101

JorgeB · August 27, 2021

6 hours ago, perfessor101 said:

i just don’t want to waste 18 hours to get an unmountable drive

You should always check if the emulated drive is mounting or if it can be fixed before starting a rebuild.

Disk16 is already rebuilt, I assume that went well?

As for disk 10 there's fatal filesystem corruption, you'll need to use the old disk, assuming it's still available, after doing a new config and running a parity check try upgrading again, but only after trying to fix these, they suggest a power/cable problem:

Aug 26 14:12:21 storage kernel: sd 11:0:6:0: Power-on or device reset occurred
...
Aug 26 14:12:21 storage kernel: sd 11:0:6:0: Power-on or device reset occurred
...
Aug 26 14:12:21 storage kernel: sd 11:0:6:0: Power-on or device reset occurred
...
Aug 26 14:12:21 storage kernel: md: disk10 read error, sector=2246592
Aug 26 14:12:21 storage kernel: md: disk10 read error, sector=2246600

perfessor101 · August 27, 2021

drive said is emulated at start of rebuild

then it switched to unmounted later

not other way around

JorgeB · August 27, 2021

2 hours ago, JorgeB said:

Disk16 is already rebuilt, I assume that went well?

Did you rebuild disk16? Looking at the syslog it shows you did a new config, parity won't be valid after that, and if that's true it would explain the unmountable disk10, since its generation doesn't match with the one from parity.

30 minutes ago, perfessor101 said:

drive said is emulated at start of rebuild

then it switched to unmounted later

The drive is always emulated, but was already unmountable first time you started the array.

perfessor101 · August 27, 2021

said disk emulated ... was quite some time later that it said unmountable

most of the problems I've had this month were from errors that my server was not reporting that I didn't catch in time.

What does the trust parity button do?

JorgeB · August 27, 2021

4 minutes ago, perfessor101 said:

said disk emulated ... was quite some time later that it said unmountable

Disk was disabled:

Aug 26 14:12:51 storage kernel: md: disk10 write error, sector=583032

You stopped the array:

Aug 26 14:16:27 storage kernel: mdcmd (38): nocheck cancel

You replaced the disk and started the array, disk was unmountable immediately at array start:

Aug 26 14:18:25 storage root: mount: /mnt/disk10: can't read superblock on /dev/md10.

What you should do here if to first start the array with the emulated disk, before replacing, it would already be unmountable.

7 minutes ago, perfessor101 said:

What does the trust parity button do?

What it sounds like, it should only be used after a new config if parity is valid, it wasn't in your case since there was a disable disk before the new config.

perfessor101 · August 27, 2021

I trusted parity

then added the drive

JorgeB · August 27, 2021

You can't do that after a disk is disabled, parity will no longer be valid, you need to rebuild the disk (or do a new config and re-sync (or correct) parity).

trurl · August 28, 2021

7 hours ago, perfessor101 said:

I trusted parity

then added the drive

Not sure if you made a big mistake there or not. If you add a drive to the array Unraid will clear it.

perfessor101 · August 28, 2021

Sorry wrong terminology I replaced drive 10 with the new drive not added

it was probably dead a month ago.
server rebooted after a power outage

drive 10 was silently dead after the reboot

monthly parity check (correcting) let me fix the next dead drive

lost cache drive (corrupted superblock it looks like)

drive errored, previous printouts list it as both assigned and unassigned under 6.9.2, best way to get array clean

Recommended Posts

perfessor101

Link to comment

JorgeB

Link to comment

perfessor101

Link to comment

JorgeB

Link to comment

perfessor101

Link to comment

perfessor101

Link to comment

trurl

Link to comment

perfessor101

Link to comment

JorgeB

Link to comment

perfessor101

Link to comment

JorgeB

Link to comment

perfessor101

Link to comment

JorgeB

Link to comment

perfessor101

Link to comment

JorgeB

Link to comment

trurl

Link to comment

perfessor101

Link to comment

Join the conversation