HeliusSol

Members

Joined
October 19, 20232 yr
Last visited
March 20Mar 20

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

9
Reputation
Neutral

0

Disk Read Error Support Request
Disk Read Error Support Request

HeliusSol replied to HeliusSol's topic in General Support

@JorgeB Just wanted to follow up. I assume this is a heat issue or something else at this point. Just checking, that you didn't see anything related to Disk Failures beyond that SAS issue? Not sure what I'd be looking for myself other than stuff in the attributes tab (or the files for those disabled disks) that looks "off"...
- April 17, 20251 yr
- 3 replies
HeliusSol started following Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...? and Disk Read Error Support Request
- April 16, 20251 yr
Disk Read Error Support Request
Disk Read Error Support Request

HeliusSol posted a topic in General Support

I had two disks go disabled due to read errors overnight. (Yes I have notifications turned on and this is the first notification I received of any issue with them.) For reference, I had a previous *issue* with most of my disks attached to my HBA SAS card a while back but only one disk went disabled due to write failure. The two disks that have become disabled are the only two left on the HBA SAS card at the moment (I thought ahead and tried to keep the disks that were getting written to constantly directly connected to the motherboard's SATA ports). My guess is that this is directly related to the HBA SAS card (I have ordered another one and it should be here soon, I will also be working toward a custom cooling solution for the new card). I shut down the server. Moved the breakout cable to a different port on the card and the disks have reappeared as connected for the time being. Here is a copy of the diagnostics with the short SMART tests ran on all the SATA connected devices. I will be running the longer tests (with the array disabled) while I am at work and will report back when I have those results.) From what I can tell from the *Attributes* the disks look good; it is either a cable issue or that HBA SAS card. I have included diagnostics before the shutdown and cable move. I have also included another set from after the disks came back online and I ran the short SMART tests on all the SATA connected drives (including the breakout cable ones).
- April 16, 20251 yr
- 3 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

I understand that. I had what I believe at the time to be 7 of my 8 data drives "go offline" of the 10 drive array (dual parity). It is quite unnerving to have 7 drives get "disconnected" (not disabled) all at the same time. Thus my conclusion about either a temporary hardware failure or a driver/firmware issue. I don't know of any setting that would allow for automatically disabling the entire array when more than 2 drives "disappear" from the array (while it is active) that might have prevented the 1 drive that got disabled from getting disabled due to the write error in the first place.
- January 27, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

For anyone late to the party. The final result is this. Rebuild was required and completed successfully. However, the reason all the HDDs connected to the HBA SAS card fell out of the array is still not clear. It appears the card failed. I wish there was a way to tell the system to shut down the array immediately when any or more than X drives "disappear". That might have prevented my need to rebuild. I think I will need to take the HBA SAS card's heatsink off, replace the thermal paste, and somehow add a fan to it as that is one of two possible reasons that this happened. The other is that there is some issue with this specific card or its firmware (I did update them a little over a year ago)/drivers and Unraid 7.0.0
- January 27, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

Notification from server: Elapsed Time 1 day, 9 hr, 30 min, 39 sec, Runtime 1 day, 9 hr, 6 min, 51 sec, Increments 3, Average Speed 184.6 MB/s I left Docker and the VM stuff off for at least the first half of it and set mover not to until a week from the day it was started. So not bad all things considered. That all said, my best guess is that HBA SAS card went offline (how? why? how to prevent? anything in logs that someone sees that might explain it would be helpful) and my attempt to force disk 8 back online manually caused the problem. If this ever happens in the future, I will attempt to reboot the system to reset the HBA SAS card. If that doesn't work, I'd be down until I get a replacement (or replace the motherboard if that is the problem at that point).
- January 14, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

I started the rebuild of disk8 after moving as many of the HDDs as I could back to SATA cables directly connected to the motherboard. It says around 24 hours for a complete rebuild of a 20TB disk at current speeds. I expect that as long as nothing happens with the motherboard or the HBA SAS card that this should resolve my issue for the moment. If anyone has any idea what happened (or may have happened) when the HBA SAS card and everything connected to it went offline, please let me know. I don't understand what happened or how to prevent it going forward. Hoping that removing most of the drives from the device will keep it "happy" for the time being. Might be worth getting a different one as a backup or something. In the future, I think I will need to make sure to just grab the diagnostics and reboot the machine before attempting to spin up the disks directly...
- January 13, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

@trurlThank you for your reply. I understand that a rebuild is necessary. My primary concern is whether there is anything indicating drive failure or other device hardware failure that would preclude me from attempting this right now. I don't know anything specific about the SMART data other than that the UDMA CRC errors can be completely isolated to the SAS breakout cables in at least some cases (which I think it what is going on with mine). I don't see any bad sectors or such. Do you (or anyone else) see anything in the logs before or after a reboot that would explain what happened? My best guess is something happened with either the PCI-E bus or the HBA SAS card itself. Power Supply failure to one of those devices? Appears the card "disappeared" but nothing I see in the logs indicates what actually happened. I'd like to start the rebuild and hope that everything recovers like it should. I just would rather try to understand what caused the problem before starting a rebuild that will definitely take over 24 hours and keep me from writing to the array for that long. (My anxiety about not knowing what happened to cause this large of an issue after over 12 months of no significant problems with the device has my stomach in knots.)
- January 12, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol replied to HeliusSol's topic in General Support

I've been looking at the logs posted in my OP. Looks like something might have happened with either the HBA SAS card or maybe the motherboard itself causing a section of the PCI--E bus to go down. I think the 7 disks in question are connected to the HBA SAS card via SAS -> SATA cables. Does anyone else see any evidence of this in the logs? I'm unsure what exactly I might be looking for. (Does this look like a motherboard issue with the bus? Does it look like a problem with the HBA SAS card itself?) If this is the case what might be recommended? I can move a grand total of 8 disks to the motherboard SATA ports but then I'm left with 2 that can't be connected at all if I try to remove the HBA SAS card from the machine for now...
- January 12, 20251 yr
- 10 replies
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?
Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

HeliusSol posted a topic in General Support

I upgraded to 7.0.0 on Friday evening. This might be relevant or not to this discussion as I have never seen this error before. (Over a year using Unraid on this specific device.) The error stated: Warning - array has errors Array has 7 disks with read errors I noticed that 7 of the 8 data disks in the array were spun down (which is odd because I had turned off spindown as part of the upgrade process for what I thought was safety reasons). Upon trying to spin up the disks, first nothing happened. Then disk 8 went disabled. Notification: "Alert - Disk 8 in error state (disk dsbl) WDC_XXXXXXXXXXXXX (sde)" Here is a diagnostics run from approximately that time: (REMOVED) Then I turned off the Docker and VM services (to limit any writing to the disks) and rebooted the device. I then get the following notifications: Notice - array turned good Array has 0 disks with read errors Notice - Disk X returned to normal operation WDC_XXXXXXXXXXXXX (sdh) [times 6] Here is another "diagnostics" run from after the reboot: (REMOVED) Did I screw something up? It appears that at a minimum I will have to rebuild disk 8 due to something causing the filesystem to get corrupted (or something along those lines). I don't want to take any additional steps before I know if this is something safe to do or if all the other disks might go down again while doing that. I checked the "Attributes" for all the other disks that went into the weird read state (1, 2, 3, 4, 5, 6) and nothing looks out of the ordinary for me. (UDMA CRC error count is low on those that have it and that was usually from something do to with static electricity discharge against the case itself. I've learned to ground myself against something else before touching the case to prevent that.) Doing a quick web and forum search it looks like this could be related to my HBA SAS controller (I thought I did a firmware update before installing it) but I have no way to confirm. Other notes: As part of the 7.0.0 upgrade, I switched all my shares to pool -> array and then ran the mover in case my btrfs RAID1 pool went down. (Also set Docker and VM services to disabled for speed and safety.) Afterward, I deleted and recreated my primary RAID1 pool with two NVME disks (primary server data) and then created a new single disk pool (temporary disk for data between mover runs to the disk. I then reconfigured all the shares to point to specific a specific pool and/or array depending on the share's use case. I hope this all looks good as well.
- January 12, 20251 yr
- 10 replies

HeliusSol

Joined

Last visited

Noob

Posts

Reputation

Disk Read Error Support Request

Disk Read Error Support Request

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Multiple Disk Read Errors across majority of my Array disk (After upgrading to 7.0.0...?

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)