Fransysco Posted October 31, 2020 Share Posted October 31, 2020 (edited) I woke up to a few emails about disk errors randomly. Two of my disks have been taken out of the array and all but two of my disks are showing at least 1 error. I swapped hardware earlier this week but did a parity check afterwards with no errors. I have not touched anything what are my steps to recover from this without loosing data? diagnostics-20201031-1621.zip Edit: I stopped the array and got a notification of "array turned good Array has 0 disks with read errors". But the two drives are still showing disabled. Edited November 1, 2020 by Fransysco Quote Link to comment
ChatNoir Posted October 31, 2020 Share Posted October 31, 2020 You should attach your diagnostics in your next post (Tools/Diagnostics). Quote Link to comment
Fransysco Posted November 1, 2020 Author Share Posted November 1, 2020 4 hours ago, ChatNoir said: You should attach your diagnostics in your next post (Tools/Diagnostics). I attached it to the original post. Quote Link to comment
JorgeB Posted November 1, 2020 Share Posted November 1, 2020 Looks like the problem started with a power failure: Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure. Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015) Are all the disks on the same UPS? Or are they in some kind of separate enclosure? Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags. Quote Link to comment
Fransysco Posted November 1, 2020 Author Share Posted November 1, 2020 4 hours ago, JorgeB said: Looks like the problem started with a power failure: Oct 31 02:05:36 RIAAHQ apcupsd[3244]: Power failure. Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:0:0: device_block, handle(0x000a) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:1:0: device_block, handle(0x000b) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:2:0: device_block, handle(0x000c) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:3:0: device_block, handle(0x000d) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:4:0: device_block, handle(0x000e) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:5:0: device_block, handle(0x000f) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:6:0: device_block, handle(0x0010) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:7:0: device_block, handle(0x0011) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:8:0: device_block, handle(0x0012) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:9:0: device_block, handle(0x0013) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:10:0: device_block, handle(0x0014) Oct 31 02:05:38 RIAAHQ kernel: sd 2:0:11:0: device_block, handle(0x0015) Are all the disks on the same UPS? Or are they in some kind of separate enclosure? Reboot to clear the errors (disable disks will remain disable), then start the array and post new diags. The night before I did have what seemed like a weird brown out in the house but it was 6+ hours prior to any alerts coming in. I never got an email alert for power failure and the disk failure emails I got were 4 hours after the time of the UPS power failure reported above. Disks are all on the same two PSUs plugged into the same UPS. Attached is the diagnostic bundle after a reboot. What I've done so far: Rebooted the array, both disks showed as disabled. I ran SMART on both and they came back clean. I removed the Parity Drive, started the array, stopped the array, added the parity drive back, and am rebuilding the parity drive. diagnostics-20201101-0920.zip Quote Link to comment
JorgeB Posted November 2, 2020 Share Posted November 2, 2020 19 hours ago, Fransysco said: I removed the Parity Drive, started the array, stopped the array, added the parity drive back, and am rebuilding the parity drive. Since emulated disk3 is mounting correctly you can do the same to it, you could even have done both at the same time. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.