Random Disk errors

Fransysco · October 31, 2020

I woke up to a few emails about disk errors randomly. Two of my disks have been taken out of the array and all but two of my disks are showing at least 1 error. I swapped hardware earlier this week but did a parity check afterwards with no errors. I have not touched anything what are my steps to recover from this without loosing data?

diagnostics-20201031-1621.zip

Edit: I stopped the array and got a notification of "array turned good
Array has 0 disks with read errors".

But the two drives are still showing disabled.

Edited November 1, 2020 by Fransysco

ChatNoir · October 31, 2020

You should attach your diagnostics in your next post (Tools/Diagnostics).

Fransysco · November 1, 2020

4 hours ago, ChatNoir said:

You should attach your diagnostics in your next post (Tools/Diagnostics).

I attached it to the original post.

JorgeB · November 1, 2020

Looks like the problem started with a power failure:

Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure.
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015)

Are all the disks on the same UPS? Or are they in some kind of separate enclosure?

Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags.

Fransysco · November 1, 2020

4 hours ago, JorgeB said:

Looks like the problem started with a power failure:


Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure.
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015)

Are all the disks on the same UPS? Or are they in some kind of separate enclosure?

Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags.

The night before I did have what seemed like a weird brown out in the house but it was 6+ hours prior to any alerts coming in. I never got an email alert for power failure and the disk failure emails I got were 4 hours after the time of the UPS power failure reported above. Disks are all on the same two PSUs plugged into the same UPS. Attached is the diagnostic bundle after a reboot.

What I've done so far:

Rebooted the array, both disks showed as disabled.

I ran SMART on both and they came back clean.

I removed the Parity Drive, started the array, stopped the array, added the parity drive back, and am rebuilding the parity drive.

diagnostics-20201101-0920.zip

JorgeB · November 2, 2020

19 hours ago, Fransysco said:

I removed the Parity Drive, started the array, stopped the array, added the parity drive back, and am rebuilding the parity drive.

Since emulated disk3 is mounting correctly you can do the same to it, you could even have done both at the same time.

Random Disk errors

Recommended Posts

Fransysco

Link to comment

ChatNoir

Link to comment

Fransysco

Link to comment

JorgeB

Link to comment

Fransysco

Link to comment

JorgeB

Link to comment

Join the conversation