Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

unraid v6.12.11 odd errors - LBA w/drive errors

Featured Replies

  • Author
15 minutes ago, Vr2Io said:

How about previous provide test result ? There are no feedback.

 

Currently we can't confirm does problem come from hardware, it could be software relate, so we need different test or setup to verify.

Roger that, I didn't even consider that it could be a software issue.  Do you want me to proceed with your stress tests you mentioned in your previous post?  

 

I stopped the array earlier, changed the data for drive 12 from the LBA card channel 7 to one of the ports on the motherboard (non LBA).  It's currently rebuilding the drive with no errors.  

  • Replies 93
  • Views 5.2k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • I'm VERY encouraged by today's results; parity check finished in just over 13 hours with no errors.  I removed the 5 in 3 cage that was connected to the most errors, completely cleaned the whole devic

  • Unraid release should not make any difference with disk errors.   Like mentioned by Vr2lo it's been a while since a long SMART test completed, so it would be good to run one.

  • Okay, I re-cabled the entire system.  Since I replaced the Seasonic with another Seasonic, the ports for the various cables matched so I didn't replace all the power cabling but I went ahead and did t

Posted Images

52 minutes ago, prongATO said:

Roger that, I didn't even consider that it could be a software issue.  Do you want me to proceed with your stress tests you mentioned in your previous post?  

Yes. Pls also try boot in safe mode and perform parity check to test does any different.

 

image.png.251906e492201393991e9244d59d307c.png

 

52 minutes ago, prongATO said:

I stopped the array earlier, changed the data for drive 12 from the LBA card channel 7 to one of the ports on the motherboard (non LBA).  It's currently rebuilding the drive with no errors.  

This fine too.

  • Author
17 hours ago, Vr2Io said:

Yes. Pls also try boot in safe mode and perform parity check to test does any different.

 

image.png.251906e492201393991e9244d59d307c.png

 

This fine too.

Disk rebuilt with no issues,  rebooted into safe mode with GUI support and am running a parity check.  Here are the diagnostics from that reboot. I performed both short and long SMART tests on disk 12, just to be safe.  (They both came back clean)

mrpunrfsx1-diagnostics-20240905-1312.zip

Edited by prongATO
Grammar

  • Author
On 9/4/2024 at 8:06 PM, Vr2Io said:

Yes. Pls also try boot in safe mode and perform parity check to test does any different.

 

image.png.251906e492201393991e9244d59d307c.png

 

This fine too.

Parity check completed with no errors on safe mode. Diagnostics attached.

mrpunrfsx1-diagnostics-20240906-1349.zip

  • Solution
52 minutes ago, prongATO said:

Parity check completed with no errors on safe mode. Diagnostics attached.

mrpunrfsx1-diagnostics-20240906-1349.zip 265.99 kB · 0 downloads

To be more solid, you could try one more parity check, if result positive could assume problem cause by plugin, then rule out which one is. I would suggest uninstall disklocation first.

Edited by Vr2Io

  • Author
4 hours ago, Vr2Io said:

To be more solid, you could try one more parity check, if result positive could assume problem cause by plugin, then rule out which one is. I would suggest uninstall disklocation first.

I have another test to verify it's a software issue.  After it completed a rebuild and full parity check with no issues.  I figured I could go ahead and shrink the array (taking out disk 16).  When it was rebooted into non-safe mode, I removed disk 16 and started to rebuild the parity drive.  After about 20%, disk 5 reported 5 errors.  I stopped rebuilding parity and rebooted into safe mode.  I'm attempting to rebuild the parity drive this way in safe mode.  If this works with no errors, then we will know, for sure, that it's a software issue causing the errors.  That would be the BEST solution, as far as I'm concerned.  I'll still do the hardware thing we've spoken privately about though, too great a price to pass up.  THANK YOU for even thinking it could be a software issue since I've been chasing hardware for the last 3 months.

 

Disk 5 is connected to the on-board LBA card, port 3.

 

I'm a dope, I completely forgot to grab the diagnostics before I rebooted into safe mode.  

Edited by prongATO

  • Author
8 hours ago, Vr2Io said:

To be more solid, you could try one more parity check, if result positive could assume problem cause by plugin, then rule out which one is. I would suggest uninstall disklocation first.

I think you've finally solved it.  It's been over 4 hours in safe mode, rebuilding parity and zero errors.  Now, I wonder what plugin is causing all this mess.

Edited by prongATO

  • Author

I think we can tentatively conclude that a plug-in is what is causing the disk errors. In safe-mode, the parity drive was rebuilt with no issues. (Shrinking the array, taking out disk 16)

 

is there a known process to determine what plug-in is causing the issues?

mrpunrfsx1-diagnostics-20240907-1204.zip

As mention, you may try uninstall disk location plugin first.

  • Author
5 minutes ago, Vr2Io said:

As mention, you may try uninstall disk location plugin first.

So, I am guessing any plug-in that directly interacts with disks?

Just now, prongATO said:

So, I am guessing any plug-in that directly interacts with disks?

Yes

  • Author
1 hour ago, Vr2Io said:

Yes

I removed all the plugins I either no longer use or thought might be the genesis of the issue.  I also removed disk location.  I paid special attention to the plugins that haven't been updated in a while and removed them.  I now only have the following plugins installed

 

image.thumb.png.b3d3d4f0470b88040581b2d3e9dff785.png

  • Author

well, left the server alone today and decided to erase and preclear the 3TB drive before I remove it from the server.  The diagnostics are spammed with errors but no "disk errors" per se.  this started spamming the log:

### [PREVIOUS LINE REPEATED 1 TIMES] ###
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: configured for UDMA/133 (device error ignored)
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3: EH complete
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: irq_stat 0x40000001
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: failed command: READ DMA
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 9 dma 4096 in
Sep  8 00:49:51 MRPUNRFSX1 kernel:         res 61/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: status: { DRDY DF ERR }
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: error: { ABRT }
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: failed to enable AA (error_mask=0x1)

mrpunrfsx1-diagnostics-20240908-0031.zip

Edited by prongATO

20 hours ago, prongATO said:

well, left the server alone today and decided to erase and preclear the 3TB drive before I remove it from the server.  The diagnostics are spammed with errors but no "disk errors" per se.  this started spamming the log:

### [PREVIOUS LINE REPEATED 1 TIMES] ###
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: configured for UDMA/133 (device error ignored)
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3: EH complete
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: irq_stat 0x40000001
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: failed command: READ DMA
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 9 dma 4096 in
Sep  8 00:49:51 MRPUNRFSX1 kernel:         res 61/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: status: { DRDY DF ERR }
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: error: { ABRT }
Sep  8 00:49:51 MRPUNRFSX1 kernel: ata3.00: failed to enable AA (error_mask=0x1)

mrpunrfsx1-diagnostics-20240908-0031.zip 262.69 kB · 2 downloads

The error come from onboard SATA controller, pls try replace SATA cable first. Strange that no SMART data for this disk ( sdd ), the other disk ( sdb, sdc, sde) in same controller have SMART data.

 

** Supermicro X10SL7-F is based on the Intel C222 chipset, we also see four SATA II 3.0gbps ports and two SATA III 6.0gbps ports. **

 

I also quick check on internet, C222 have 6 port and only two were 6Gb, that also may cause some compatible issue if 6Gb disk connect to 3Gb port.

 

image.png.9ad0512e1b50e89b80e8d96b97240f6e.png

Edited by Vr2Io

  • Author
On 9/8/2024 at 8:51 PM, Vr2Io said:

The error come from onboard SATA controller, pls try replace SATA cable first. Strange that no SMART data for this disk ( sdd ), the other disk ( sdb, sdc, sde) in same controller have SMART data.

 

** Supermicro X10SL7-F is based on the Intel C222 chipset, we also see four SATA II 3.0gbps ports and two SATA III 6.0gbps ports. **

 

I also quick check on internet, C222 have 6 port and only two were 6Gb, that also may cause some compatible issue if 6Gb disk connect to 3Gb port.

 

image.png.9ad0512e1b50e89b80e8d96b97240f6e.png

As we spoke in private messages, I have the new motherboard/processor and RAM coming sometime this week.  The drive that was referenced with all the errors was the 3TB drive I removed from the array and system.  I've seen no errors booted into normal mode since I removed about 14 plugins.  I think you were spot-on when you postulated that it may be a software issue, rather than a hardware issue.  

 

Again, I've experienced 0 disk errors since removing the plugins.  Next week, I will be installing the new motherboard/processor/RAM when everything arrives.  I will also complete the modifications to the remaining three 5-in-3 cages. (completely cleaning every cage, replacing the fans with Noctua NF-R8 redux-1800 and making some minor modifications to fit the 25mm depth of the fans).  

 

I am 99% sure your idea that it was, in fact, a software issue causing the intermittent HD errors that plagued me for months.  I will mark this as solved.  Thank you SO much for your help.  When I get all the new hardware installed and configured, I'll post a final diagnostic package to see if there's anything else you think I need to address.  

 

Here is the latest diagnostics package.

mrpunrfsx1-diagnostics-20240910-1637.zip

1 hour ago, prongATO said:

As we spoke in private messages, I have the new motherboard/processor and RAM coming sometime this week.  The drive that was referenced with all the errors was the 3TB drive I removed from the array and system.  I've seen no errors booted into normal mode since I removed about 14 plugins.  I think you were spot-on when you postulated that it may be a software issue, rather than a hardware issue.  

 

Again, I've experienced 0 disk errors since removing the plugins.  Next week, I will be installing the new motherboard/processor/RAM when everything arrives.  I will also complete the modifications to the remaining three 5-in-3 cages. (completely cleaning every cage, replacing the fans with Noctua NF-R8 redux-1800 and making some minor modifications to fit the 25mm depth of the fans).  

 

I am 99% sure your idea that it was, in fact, a software issue causing the intermittent HD errors that plagued me for months.  I will mark this as solved.  Thank you SO much for your help.  When I get all the new hardware installed and configured, I'll post a final diagnostic package to see if there's anything else you think I need to address.  

 

Here is the latest diagnostics package.

mrpunrfsx1-diagnostics-20240910-1637.zip 279.7 kB · 0 downloads

Look like those are positive update, I agree you should focus more new platform, shouldn't spend too much on current.

 

Wish your new platform all the best !!

 

Recently, I haven't too much change with Unraid and they basically work well, so less effort on them.

 

IoT and automation stuff is current project, my 2nd Nest thermostat arrives, I like those gadgets.

 

IMG_20240911_070116.thumb.jpg.0d0d12b95ae95669079682b2c97dcad9.jpg

 

17 minutes ago, Vr2Io said:

IoT and automation stuff is current project, my 2nd Nest thermostat arrives, I like those gadgets.

I recommend testing all your gadgets for functionality without internet connectivity. Obviously the cloud based services won't function, but it would suck to set something up and not be able to manually control it if needed.

 

Lighting, climate control and other critical stuff shouldn't need internet for basic functionality, if they do, you will find yourself in a bind at the worst possible time.

 

If I can't make a gadget work the way I need on a VLAN with no internet, it's not going in my house.

15 minutes ago, JonathanM said:

I recommend testing all your gadgets for functionality without internet connectivity. Obviously the cloud based services won't function, but it would suck to set something up and not be able to manually control it if needed.

 

Lighting, climate control and other critical stuff shouldn't need internet for basic functionality, if they do, you will find yourself in a bind at the worst possible time.

 

If I can't make a gadget work the way I need on a VLAN with no internet, it's not going in my house.

Yes, we always need consider security issue, so I have put them in different VLAN, and mostly important the application server ( Unraid with home assistant ) haven't any critical private data.

 

The reason for the 2nd Nest is because the old Nest doesn't support Matter ( that means must cloud base ), so I realised I should advance to Matter for future prove, also an entry point for some new technology.

Edited by Vr2Io

  • 2 weeks later...
  • Author

I just wanted to add, I think this was a multi-faceted issue.  I think there was an add-on that contributed to the intermittent write/read errors to disks.  I ended up changing/upgrading every single component in my system.  I changed the motherboard, processor (E3 to E5 Xeon), RAM (DDR3 to DDR4), drive SATA cables, LBA breakout cables, power supply (660W Seasonic to 850W Seasonic), power supply cables, PS cabling configuration and LBA card.  Even after changing all that, I ended up with errors again on one disk and it couldn't be rebuilt, even in safe mode (like worked before).  I just ordered a new 5 in 3 hot swap cage, I narrowed down the lion's-share of the issues to one 5 in 3 hot swap cage (an old iStarUSA  cage) and am replacing that cage.  I am OCD so I'll eventually change the other 3 cages (I have space for a total of 20 3.5" spinning HDD) but at 130.00 a pop, it's not cheap. 

 

I will say that I think the Silverstone 5 in 3 I chose is the nicest HDD cage I've used and I like that it's tray-less.  this is the one I ended up going with:https://www.amazon.com/dp/B07ZWK1337?ref=ppx_yo2ov_dt_b_fed_asin_title

Edited by prongATO

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.