Random Disk errors


Recommended Posts

I woke up to a few emails about disk errors randomly. Two of my disks have been taken out of the array and all but two of my disks are showing at least 1 error. I swapped hardware earlier this week but did a parity check afterwards with no errors. I have not touched anything what are my steps to recover from this without loosing data?

 

image.thumb.png.5d2ca078ec0f97049f30a285b0f68b07.png

diagnostics-20201031-1621.zip

 

Edit: I stopped the array and got a notification of "array turned good
Array has 0 disks with read errors".

 

But the two drives are still showing disabled.

Edited by Fransysco
Link to comment

Looks like the problem started with a power failure:

 

Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure.
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015)

 

Are all the disks on the same UPS? Or are they in some kind of separate enclosure?

 

Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags.

Link to comment
4 hours ago, JorgeB said:

Looks like the problem started with a power failure:

 


Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure.
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014)
Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015)

 

Are all the disks on the same UPS? Or are they in some kind of separate enclosure?

 

Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags.

The night before I did have what seemed like a weird brown out in the house but it was 6+ hours prior to any alerts coming in. I never got an email alert for power failure and the disk failure emails I got were 4 hours after the time of the UPS power failure reported above. Disks are all on the same two PSUs plugged into the same UPS. Attached is the diagnostic bundle after a reboot.

 

What I've done so far:

Rebooted the array, both disks showed as disabled.

I ran SMART on both and they came back clean.

I removed the Parity Drive, started the array, stopped the array, added the parity drive back, and am rebuilding the parity drive.

diagnostics-20201101-0920.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.