aethelnas

Members

Joined
May 13, 20233 yr
Last visited
April 29, 20242 yr

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

3
Reputation
Neutral

0

Multiple disks errored/disabled
Multiple disks errored/disabled

aethelnas replied to aethelnas's topic in General Support

On one of the days leading up to this, I found this server and my plex box (attached to same surge protector) both off. I thought that it was due to a recent storm, but I also discovered the electric company put in a new component in the outside electrical box and it is possible the power went down then. I'm not sure if this is the beginning and cause to my issues, but it certainly is something I'd like to avoid. I'm still in the market for a UPS. There's just such a large variation in prices and what they're capable of, and I'm not super familiar with them yet. But what you're pointing out I believe is the disks dropping offline by themselves while the server remained out (or else the diagnostics wouldn't exist.) I went down the list of variables, attempting to isolate a single change per set of errors/array starting/boot. Since swapping out the SATA power cord, I have not experienced those same errors. I have received a few that were similar, but I cannot figure out what device they were referring to. It has been since a reboot, so as of now I cannot find them, but I will continue to look. It said something along the lines of "I/o error on device md3p1." I couldn't figure out what device that could be, even using google. That occurred about four times in a row, and has since stopped. Three days ago I got about fifty of these odd "nginx: 2023/08/25 23:30:03 [crit] 11627#11627: ngx_slab_alloc() failed: no memory" followed by, "Aug 25 23:30:03 AethelNas nginx: 2023/08/25 23:30:03 [error] 11627#11627: nchan: Out of shared memory while allocating message of size 8754. Increase nchan_max_reserved_memory." I do not know if these things are related at all.
- August 28, 20232 yr
- 3 replies
Multiple disks errored/disabled
Multiple disks errored/disabled

aethelnas posted a topic in General Support

Without creating an enormous amount of text, I'll try to explain what I've got going on as simply as possible. I recently added a parity drive (old one kept disabling). This one also disabled after an unclean shutdown (powerloss suspected during storm). I stopped the array, removed drive as parity, started array, stopped, and restarted with drive as parity to rebuild. Parity check failed, disabled again, but read-check of drives continued in it's place so I allowed that to continue. ~9000 errors were found, but still given a healthy status. I stopped the array and swapped breakout cable for brand new one. This time every drive experienced errors until they disabled. I saved the diagnostics between every reboot, because sometimes I couldn't even stop the array and the errors weren't always the same even per drive. The last attempt probably had the most disastrous log (the diagnostics attached), some examples being... I/O error, dev sdb, sector 9208 op 0x0:(READ) flags 0x0 md: disk2 read error, sector=9144 md: disk1 read error, sector=11802528 md: recovery thread: multiple disk errors, sector=9144 md: disk0 write error, sector=5376 Buffer I/O error on dev sdg, logical block 15808704, lost async page write BTRFS info (device loop2: state E): forced readonly BTRFS warning (device loop2: state E): Skipping commit of aborted transaction. BTRFS: error (device loop2: state EA) in cleanup_transaction:1958: errno=-5 IO failure device offline error, dev sdg, sector 150072464 op 0x1:(WRITE) flags 0x104000 phys_seg 64 prio class 2 Common denominators. 1. All drives throwing errors are connected to Adaptec 7085 (direct motherboard attached drives are fine) 2. Breakout cable (swapped with brand new one) 3. Sata power cable (removing the 3.3v in one now to remove this variable) All of these drives passed had healthy SMART statuses. I'm worried I'll have to wipe the whole thing. Hopefully I can take the data off the array and back it up. What's the next step here? Is it saveable? Possible explanations for what happened? Thanks for taking a look. disaster.zip
- August 12, 20232 yr
- 3 replies
Syslog disk errors during preclear
Syslog disk errors during preclear

aethelnas posted a topic in General Support

Originally, three used Seagate Exos x14 14tb HDDs were purchased for my first server build. All three were precleared successfully. Two were added to the array and one became parity. Extended smart tests were run for all three and given a healthy status. Data began being added to cache drives, before being scheduled to move to the array. As soon as data started being written to the actual array, the parity disk would enter a disabled state after receiving read/write notification errors. I stopped the array, removed the disk, started the array, stopped, and readded the drive. It rebuilt parity and failed again after being written to. After reading into the possible causes of this, I switched the parity to another cable, it began rebuilding parity, but again as soon as data started being written to it, it disabled. Recently I bought two of the same drives. One would be replacing my disabled parity and the other would be added to the array. They both precleared successfully. But this time I noticed multiple instances of this error in the syslog. I also ran extended SMART tests on both drives. The first time, the drive that gave errors in the syslog seemed to abort the test early. It said, "aborted by user command," but it certainly wasn't done intentionally. I read that this could happen if the delayed spin down setting was causing it to stop, but I have this setting disabled and neither disk has been added to the array yet to be affected by that setting. I restarted the extended test for the drive that aborted. By the end both were given healthy statuses, but someone can probably interpret this information better than I can. All of the cables are just a few months old and should not be causing issues, but I plan to get backups if I need to rerun anything. Although, I'm not sure I can even replicate the errors as they didn't show in anything aside from the syslog during the preclear. Unfortunately, I do not believe I have a complete .zip that goes back as far as the original three drives. It was all brand new to me and I didn't realize syslog restarted when powering down. I do have the diagnostics for the events I described with the two new disks. I do not understand why a disk would give errors in the syslog that you specifically have to look for, but not show anything of the sort in the preclear logs or SMART info. I downloaded the preclear logs and put them into the diagnostics folder as I don't think they are typically included but I could be wrong. They are in the folder labeled preclears. I need help figuring out if this is a faulty drive that needs to be sent back for a new one, within a fairly short warranty period of 30 days. Thanks for the time and help. aethelnas-diagnostics-20230723-1106.zip
- July 23, 20232 yr

aethelnas

Joined

Last visited

Noob

Posts

Reputation

Multiple disks errored/disabled

Multiple disks errored/disabled

Syslog disk errors during preclear

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)