Jump to content

Multiple disks errored/disabled


Recommended Posts

Without creating an enormous amount of text, I'll try to explain what I've got going on as simply as possible. I recently added a parity drive (old one kept disabling). This one also disabled after an unclean shutdown (powerloss suspected during storm). I stopped  the array, removed drive as parity, started array, stopped, and restarted with drive as parity to rebuild. Parity check failed, disabled again, but read-check of drives continued in it's place so I allowed that to continue. ~9000 errors were found, but still given a healthy status. I stopped the array and swapped breakout cable for brand new one. This time every drive experienced errors until they disabled. I saved the diagnostics between every reboot, because sometimes I couldn't even stop the array and the errors weren't always the same even per drive. The last attempt probably had the most disastrous log (the diagnostics attached), some examples being...

 

I/O error, dev sdb, sector 9208 op 0x0:(READ) flags 0x0
md: disk2 read error, sector=9144
md: disk1 read error, sector=11802528
md: recovery thread: multiple disk errors, sector=9144
md: disk0 write error, sector=5376
Buffer I/O error on dev sdg, logical block 15808704, lost async page write
BTRFS info (device loop2: state E): forced readonly
BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
BTRFS: error (device loop2: state EA) in cleanup_transaction:1958: errno=-5 IO failure
device offline error, dev sdg, sector 150072464 op 0x1:(WRITE) flags 0x104000 phys_seg 64 prio class 2

 

Common denominators.
1. All drives throwing errors are connected to Adaptec 7085 (direct motherboard attached drives are fine)
2. Breakout cable (swapped with brand new one)
3. Sata power cable (removing the 3.3v in one now to remove this variable)

 

All of these drives passed had healthy SMART statuses. I'm worried I'll have to wipe the whole thing. Hopefully I can take the data off the array and back it up. What's the next step here? Is it saveable? Possible explanations for what happened? Thanks for taking a look.

disaster.zip

Link to comment
  • 2 weeks later...

     On one of the days leading up to this, I found this server and my plex box (attached to same surge protector) both off. I thought that it was due to a recent storm, but I also discovered the electric company put in a new component in the outside electrical box and it is possible the power went down then. I'm not sure if this is the beginning and cause to my issues, but it certainly is something I'd like to avoid. I'm still in the market for a UPS. There's just such a large variation in prices and what they're capable of, and I'm not super familiar with them yet. But what you're pointing out I believe is the disks dropping offline by themselves while the server remained out (or else the diagnostics wouldn't exist.) I went down the list of variables, attempting to isolate a single change per set of errors/array starting/boot. Since swapping out the SATA power cord, I have not experienced those same errors.

    I have received a few that were similar, but I cannot figure out what device they were referring to. It has been since a reboot, so as of now I cannot find them, but I will continue to look. It said something along the lines of "I/o error on device md3p1." I couldn't figure out what device that could be, even using google. That occurred about four times in a row, and has since stopped. Three days ago I got about fifty of these odd "nginx: 2023/08/25 23:30:03 [crit] 11627#11627: ngx_slab_alloc() failed: no memory" followed by, "Aug 25 23:30:03 AethelNas nginx: 2023/08/25 23:30:03 [error] 11627#11627: nchan: Out of shared memory while allocating message of size 8754. Increase nchan_max_reserved_memory." I do not know if these things are related at all.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...