Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

HeliusSol

Members
  • Joined

  • Last visited

  1. @JorgeB Just wanted to follow up. I assume this is a heat issue or something else at this point. Just checking, that you didn't see anything related to Disk Failures beyond that SAS issue? Not sure what I'd be looking for myself other than stuff in the attributes tab (or the files for those disabled disks) that looks "off"...
  2. I had two disks go disabled due to read errors overnight. (Yes I have notifications turned on and this is the first notification I received of any issue with them.) For reference, I had a previous *issue* with most of my disks attached to my HBA SAS card a while back but only one disk went disabled due to write failure. The two disks that have become disabled are the only two left on the HBA SAS card at the moment (I thought ahead and tried to keep the disks that were getting written to constantly directly connected to the motherboard's SATA ports). My guess is that this is directly related to the HBA SAS card (I have ordered another one and it should be here soon, I will also be working toward a custom cooling solution for the new card). I shut down the server. Moved the breakout cable to a different port on the card and the disks have reappeared as connected for the time being. Here is a copy of the diagnostics with the short SMART tests ran on all the SATA connected devices. I will be running the longer tests (with the array disabled) while I am at work and will report back when I have those results.) From what I can tell from the *Attributes* the disks look good; it is either a cable issue or that HBA SAS card. I have included diagnostics before the shutdown and cable move. I have also included another set from after the disks came back online and I ran the short SMART tests on all the SATA connected drives (including the breakout cable ones).
  3. I understand that. I had what I believe at the time to be 7 of my 8 data drives "go offline" of the 10 drive array (dual parity). It is quite unnerving to have 7 drives get "disconnected" (not disabled) all at the same time. Thus my conclusion about either a temporary hardware failure or a driver/firmware issue. I don't know of any setting that would allow for automatically disabling the entire array when more than 2 drives "disappear" from the array (while it is active) that might have prevented the 1 drive that got disabled from getting disabled due to the write error in the first place.
  4. For anyone late to the party. The final result is this. Rebuild was required and completed successfully. However, the reason all the HDDs connected to the HBA SAS card fell out of the array is still not clear. It appears the card failed. I wish there was a way to tell the system to shut down the array immediately when any or more than X drives "disappear". That might have prevented my need to rebuild. I think I will need to take the HBA SAS card's heatsink off, replace the thermal paste, and somehow add a fan to it as that is one of two possible reasons that this happened. The other is that there is some issue with this specific card or its firmware (I did update them a little over a year ago)/drivers and Unraid 7.0.0
  5. Notification from server: Elapsed Time 1 day, 9 hr, 30 min, 39 sec, Runtime 1 day, 9 hr, 6 min, 51 sec, Increments 3, Average Speed 184.6 MB/s I left Docker and the VM stuff off for at least the first half of it and set mover not to until a week from the day it was started. So not bad all things considered. That all said, my best guess is that HBA SAS card went offline (how? why? how to prevent? anything in logs that someone sees that might explain it would be helpful) and my attempt to force disk 8 back online manually caused the problem. If this ever happens in the future, I will attempt to reboot the system to reset the HBA SAS card. If that doesn't work, I'd be down until I get a replacement (or replace the motherboard if that is the problem at that point).
  6. I started the rebuild of disk8 after moving as many of the HDDs as I could back to SATA cables directly connected to the motherboard. It says around 24 hours for a complete rebuild of a 20TB disk at current speeds. I expect that as long as nothing happens with the motherboard or the HBA SAS card that this should resolve my issue for the moment. If anyone has any idea what happened (or may have happened) when the HBA SAS card and everything connected to it went offline, please let me know. I don't understand what happened or how to prevent it going forward. Hoping that removing most of the drives from the device will keep it "happy" for the time being. Might be worth getting a different one as a backup or something. In the future, I think I will need to make sure to just grab the diagnostics and reboot the machine before attempting to spin up the disks directly...
  7. @trurlThank you for your reply. I understand that a rebuild is necessary. My primary concern is whether there is anything indicating drive failure or other device hardware failure that would preclude me from attempting this right now. I don't know anything specific about the SMART data other than that the UDMA CRC errors can be completely isolated to the SAS breakout cables in at least some cases (which I think it what is going on with mine). I don't see any bad sectors or such. Do you (or anyone else) see anything in the logs before or after a reboot that would explain what happened? My best guess is something happened with either the PCI-E bus or the HBA SAS card itself. Power Supply failure to one of those devices? Appears the card "disappeared" but nothing I see in the logs indicates what actually happened. I'd like to start the rebuild and hope that everything recovers like it should. I just would rather try to understand what caused the problem before starting a rebuild that will definitely take over 24 hours and keep me from writing to the array for that long. (My anxiety about not knowing what happened to cause this large of an issue after over 12 months of no significant problems with the device has my stomach in knots.)
  8. I've been looking at the logs posted in my OP. Looks like something might have happened with either the HBA SAS card or maybe the motherboard itself causing a section of the PCI--E bus to go down. I think the 7 disks in question are connected to the HBA SAS card via SAS -> SATA cables. Does anyone else see any evidence of this in the logs? I'm unsure what exactly I might be looking for. (Does this look like a motherboard issue with the bus? Does it look like a problem with the HBA SAS card itself?) If this is the case what might be recommended? I can move a grand total of 8 disks to the motherboard SATA ports but then I'm left with 2 that can't be connected at all if I try to remove the HBA SAS card from the machine for now...
  9. I upgraded to 7.0.0 on Friday evening. This might be relevant or not to this discussion as I have never seen this error before. (Over a year using Unraid on this specific device.) The error stated: Warning - array has errors Array has 7 disks with read errors I noticed that 7 of the 8 data disks in the array were spun down (which is odd because I had turned off spindown as part of the upgrade process for what I thought was safety reasons). Upon trying to spin up the disks, first nothing happened. Then disk 8 went disabled. Notification: "Alert - Disk 8 in error state (disk dsbl) WDC_XXXXXXXXXXXXX (sde)" Here is a diagnostics run from approximately that time: (REMOVED) Then I turned off the Docker and VM services (to limit any writing to the disks) and rebooted the device. I then get the following notifications: Notice - array turned good Array has 0 disks with read errors Notice - Disk X returned to normal operation WDC_XXXXXXXXXXXXX (sdh) [times 6] Here is another "diagnostics" run from after the reboot: (REMOVED) Did I screw something up? It appears that at a minimum I will have to rebuild disk 8 due to something causing the filesystem to get corrupted (or something along those lines). I don't want to take any additional steps before I know if this is something safe to do or if all the other disks might go down again while doing that. I checked the "Attributes" for all the other disks that went into the weird read state (1, 2, 3, 4, 5, 6) and nothing looks out of the ordinary for me. (UDMA CRC error count is low on those that have it and that was usually from something do to with static electricity discharge against the case itself. I've learned to ground myself against something else before touching the case to prevent that.) Doing a quick web and forum search it looks like this could be related to my HBA SAS controller (I thought I did a firmware update before installing it) but I have no way to confirm. Other notes: As part of the 7.0.0 upgrade, I switched all my shares to pool -> array and then ran the mover in case my btrfs RAID1 pool went down. (Also set Docker and VM services to disabled for speed and safety.) Afterward, I deleted and recreated my primary RAID1 pool with two NVME disks (primary server data) and then created a new single disk pool (temporary disk for data between mover runs to the disk. I then reconfigured all the shares to point to specific a specific pool and/or array depending on the share's use case. I hope this all looks good as well.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.