Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Help! Multiple HDDs getting lots of read errors all at the same time

Featured Replies

In the last week or so I started getting unraid notifications that multiple disks (around 4-6) were having read errors. When I logged into the dashboard I saw on the Main tab that those disks had error counts in the table. I have 13 disks (1 parity) and the errors ranged from 1 to ~1200. When I restart the server all errors goto zero and I get an unraid notif saying all read errors have returned to normal. Since that first time it has happened about 2-3 more times with the most recent time the system locked up and I had to hold the power button to restart it.

 

I didn't take a screenshot or pic of the dashboard when it had all the errors. I will do that next time

 

Some info about my setup:

- All HDDs are SAS and were bought used from eBay in batches (so HDDs brands/models cluster together)

- All HDDs go through my Supermicro backplane (I believe it is bpn-sas3-826el1)

- Backplane is connected to mobo via HBA card (Adaptec asr-7805)

 

I find it highly unlikely that HDDs across multiple vendors and generations would start failing at the exact same moment so I wonder if it is the backplane or HBA card. Does that seem reasonable? How can I verify this?

 

Attached my diagnostics zip, the monitor log before I had to force shutdown, and my hardware setup described above.

 

Thanks in advance!

IMG_4596.jpeg

IMG_4597.jpeg

IMG_4598.jpeg

IMG_4599.jpeg

image.thumb.png.5107d7d5452c063655537956ecacbf00.png

maroon-diagnostics-20240905-0943.zip

Edited by tone
add info

  • Community Expert

DIags are after a reboot, enable the syslog server in case the server crashes again, and post that when it next happens.

  • Author

Thanks for responding. I have enabled the syslog server and will post again when it happens.

 

Appreciate the help!

  • Community Expert
Sep  6 03:18:17 maroon kernel: aacraid 0000:01:00.0: IOP reset failed
Sep  6 03:18:17 maroon kernel: aacraid 0000:01:00.0: ARC Reset attempt failed

 

Controller issues, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot.

  • 3 weeks later...
  • Author

Ok, I have done a few things since the last post:

  1. I got a dedicated fan on the heatsink of the HBA (blowing toward it)
  2. I still got errors so I replaced the HBA with another one (from ebay)
  3. still getting errors so now I think its either the SAS cables or my backplane or my HDDs?

 

another symptom I am experiencing is that the server has locked up at 100% cpu waiting on iowait process. 
 

also shutdown doesn’t seem to work, it gets stuck at “Forcing shutdown…”

 

I attached my latest diagnostics incase the errors are different but otherwise I will replace the cables then if needed backplane :(

 

 

maroon-diagnostics-20240929-0959.zip

IMG_4729.jpeg

IMG_4724.jpeg

Edited by tone
Added pics

  • Community Expert

Log is completely spammed with controller related crashes, but cannot see the start of the problem, reboot to clear the logs and post new diags as soon as you see errors in the log.

  • 1 month later...
  • Author

Ok update here. 

 

I actually turned off the ZFS backups (uninstalled Sanoid and the ZFS Plugin and disabled the user scripts) and had not had an error for 1+ month.

I also increased the fans so there was better cooling in the case and on the HBA.

 

Anyway, I had an error last night but now on only one disk (Disk 6, which was the ZFS backup target):

image.thumb.png.c704dca7deae978c7a7d364aa162924f.png

 

Log has new errors too:

image.thumb.png.50f9251f020965e1eeb078f8557f7083.png

 

I am not able to download a diagnostics as it freezes/hangs:

image.thumb.png.2675b2a3ed51f74d57cbd3fb8dda0f2c.png

 

Here is the disk log for Disk 6 (sde):

image.thumb.png.77f5fa3f55aec699ecf0b4b62d1f4634.png

image.thumb.png.a3e93ef0cf8c4db79391151f23a96db1.png

 

LMK if this should be a new post/topic altogether. Any idea what I should do? TIA!

  • Community Expert

Looks like the disk dropped offline, I would disable spin down and see if it still happens.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.