Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Disk Read errors

Featured Replies

tower-diagnostics-20190205-1216.zipHello,

 

I'm having some very frusterating problems with Disks dropping from the array with read errors. They are specific to a set of brand new drive I have purchased, each drive was precleared prior to entering the array with no SMART errors. 

 

I purchased 10 Toshiba 8tb drives to begin replacing some aging disks, as soon as I started adding the disks to the array I was having problems, initially the XFS filesystem was becoming corrupt and unable to be repaired, I removed the offending drives and used UFS Explorer to recover all the data successfully. I have now backed up all my data on separate drives and have started a completely new array.

 

With the trial and error restoring my files from backups it seems as soon as one of the drives fills up to 4.03TB I start getting disk read errors. I have managed to get one drive to start filling past the 4.03tb mark, however this was by transferring directly to the disk share rather than the user share (again through trial and error). Obviously this isn't the intended used case however I am trying to understand what the root cause is.

 

One of the disks I am pre-clearing is also being limited to 1.5gbs with these errors showing:

Feb 4 03:15:36 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 03:15:36 Tower kernel: ata3.00: ATA-10: TOSHIBA MG05ACA800E, Z6LGK004FXJD, GX0R, max UDMA/100
Feb 4 03:15:36 Tower kernel: ata3.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA
Feb 4 03:15:36 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:15:36 Tower kernel: sd 3:0:0:0: [sdk] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
Feb 4 03:15:36 Tower kernel: sd 3:0:0:0: [sdk] 4096-byte physical blocks
Feb 4 03:15:36 Tower kernel: sd 3:0:0:0: [sdk] Write Protect is off
Feb 4 03:15:36 Tower kernel: sd 3:0:0:0: [sdk] Mode Sense: 00 3a 00 00
Feb 4 03:15:36 Tower kernel: sd 3:0:0:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 4 03:15:36 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x20000 SErr 0xb0802 action 0xe frozen
Feb 4 03:15:36 Tower kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed
Feb 4 03:15:36 Tower kernel: ata3: SError: { RecovComm HostInt PHYRdyChg PHYInt 10B8B }
Feb 4 03:15:36 Tower kernel: ata3.00: failed command: READ FPDMA QUEUED
Feb 4 03:15:36 Tower kernel: ata3.00: cmd 60/08:88:00:00:00/00:00:00:00:00/40 tag 17 ncq dma 4096 in
Feb 4 03:15:36 Tower kernel: ata3.00: status: { DRDY }
Feb 4 03:15:36 Tower kernel: ata3: hard resetting link
Feb 4 03:15:42 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 03:15:42 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:15:42 Tower kernel: ata3: EH complete
Feb 4 03:15:42 Tower kernel: sdk: sdk1
Feb 4 03:15:42 Tower kernel: sd 3:0:0:0: [sdk] Attached SCSI disk
Feb 4 03:15:44 Tower kernel: ata3: SATA link down (SStatus 0 SControl 300)
Feb 4 03:15:56 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 03:15:56 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:03 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 4 03:16:03 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:04 Tower kernel: ata3: limiting SATA link speed to 3.0 Gbps
Feb 4 03:16:10 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 4 03:16:10 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:17 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 4 03:16:17 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:24 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 4 03:16:24 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:31 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 4 03:16:31 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:32 Tower kernel: ata3: limiting SATA link speed to 1.5 Gbps
Feb 4 03:16:39 Tower kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 4 03:16:39 Tower kernel: ata3.00: configured for UDMA/100
Feb 4 03:16:47 Tower kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 4 03:16:47 Tower kernel: ata3.00: configured for UDMA/100

 

Again this disk is brand new and passes SMART scans.

 

I did read there were some Linux Kernel bugs with certain drives and SATA controllers, however I have tried with my motherboard onboard sata ports and my LSI 9260-8i, both having the same results. My motherboard is a Gigabyte X399 board with a Treadripper 1900x. Have also ensured to seat and re-seat cables and try different cables.

 

I have attached full diagnostics below. Hoping someone is going to point out where I have done something incredibly stupid and help resolve the issue for me because at this point I am ready to give up.

 

 

Edited by GoudaK

  • Community Expert

Have you checked the power cables and connections from the disks all the way back to the power supply?

  • Author

Thank you for your reply.

I have checked that, have plugged and unplugged them in a number of times also. 

 

At one point I had 2 rows of 4 drives, on one row there was 2x Toshiba drives, 1 Seagate and 1 HGST the Toshiba drives were still causing issues the other 2 drives have not had any issues at all. The other row has 4 Toshiba drives. I am using molex to 4x sata breakout cables both running off separate rails off the PSU. The PSU is a 1200w Silverstone Gold PSU.

  • Author

I should also note, as soon as a drive failed I proceeded to preclear it for a second time ensuring not to physically move it, the resulting preclear and post read was successful.

  • Community Expert

Disable spin down for disk4, Unraid can't currently spin down SAS disks and it's spamming your log with related errors, making it much harder to analyze and missing some time, after that reboot and work normally until you have some errors, then please post new diags.

  • Author

Thanks Johnnie, I realised too late it was causing issues.

I've disabled and will reboot and report back.

  • Author

So I have rebooted and now keep getting sata link resets...

 

Feb 6 12:10:54 Tower kernel: ata6: SError: { RecovComm HostInt PHYRdyChg PHYInt }
Feb 6 12:10:54 Tower kernel: ata6.00: failed command: READ DMA EXT
Feb 6 12:10:54 Tower kernel: ata6.00: cmd 25/00:f8:08:04:1b/00:03:00:00:00/e0 tag 19 dma 520192 in
Feb 6 12:10:54 Tower kernel: res 50/00:00:07:04:1b/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)
Feb 6 12:10:54 Tower kernel: ata6.00: status: { DRDY }
Feb 6 12:10:54 Tower kernel: ata6: hard resetting link
Feb 6 12:10:55 Tower kernel: ata6: SATA link down (SStatus 0 SControl 320)
Feb 6 12:10:57 Tower kernel: ata6: hard resetting link
Feb 6 12:10:57 Tower kernel: ata6: SATA link down (SStatus 0 SControl 320)
Feb 6 12:10:59 Tower kernel: ata6: hard resetting link
Feb 6 12:11:04 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)
Feb 6 12:11:07 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 6 12:11:07 Tower kernel: ata6.00: configured for UDMA/100
Feb 6 12:11:07 Tower kernel: ata6: EH complete
Feb 6 12:11:07 Tower kernel: ata7.00: exception Emask 0x50 SAct 0x0 SErr 0x30802 action 0xe frozen
Feb 6 12:11:07 Tower kernel: ata7.00: irq_stat 0x00400000, PHY RDY changed
Feb 6 12:11:07 Tower kernel: ata7: SError: { RecovComm HostInt PHYRdyChg PHYInt }
Feb 6 12:11:07 Tower kernel: ata7.00: failed command: READ DMA EXT
Feb 6 12:11:07 Tower kernel: ata7.00: cmd 25/00:08:08:30:1b/00:04:00:00:00/e0 tag 6 dma 528384 in
Feb 6 12:11:07 Tower kernel: res 50/00:00:07:30:1b/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)
Feb 6 12:11:07 Tower kernel: ata7.00: status: { DRDY }
Feb 6 12:11:07 Tower kernel: ata7: hard resetting link
Feb 6 12:11:08 Tower kernel: ata7: SATA link down (SStatus 0 SControl 310)
Feb 6 12:11:10 Tower kernel: ata7: hard resetting link
Feb 6 12:11:10 Tower kernel: ata7: SATA link down (SStatus 0 SControl 310)
Feb 6 12:11:11 Tower kernel: ata7: hard resetting link
Feb 6 12:11:20 Tower kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb 6 12:11:20 Tower kernel: ata7.00: configured for UDMA/33
Feb 6 12:11:20 Tower kernel: ata7: EH complete
Feb 6 12:11:20 Tower kernel: ata7.00: exception Emask 0x50 SAct 0x0 SErr 0x30802 action 0xe frozen
Feb 6 12:11:20 Tower kernel: ata7.00: irq_stat 0x00400000, PHY RDY changed
Feb 6 12:11:20 Tower kernel: ata7: SError: { RecovComm HostInt PHYRdyChg PHYInt }
Feb 6 12:11:20 Tower kernel: ata7.00: failed command: READ DMA EXT
Feb 6 12:11:20 Tower kernel: ata7.00: cmd 25/00:f8:08:d8:1b/00:03:00:00:00/e0 tag 8 dma 520192 in
Feb 6 12:11:20 Tower kernel: res 50/00:00:07:d8:1b/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)
Feb 6 12:11:20 Tower kernel: ata7.00: status: { DRDY }
Feb 6 12:11:20 Tower kernel: ata7: hard resetting link
Feb 6 12:11:21 Tower kernel: ata7: SATA link down (SStatus 0 SControl 310)
Feb 6 12:11:23 Tower kernel: ata7: hard resetting link
Feb 6 12:11:23 Tower kernel: ata7: SATA link down (SStatus 0 SControl 310)
Feb 6 12:11:24 Tower kernel: ata7: hard resetting link

  • Author

I have again reseated cables with np effect.

  • Community Expert

Grab those diagnostics, update the LSI  firmware to latest, 20.00.07.00, since the one you're using has known issues, connect both disks to the LSI controller (if don't know which post the diags), work for a little while, if there are errors post both diagnostics.

  • Author

I'll grab full diagnostic when I get home this arvo.

I should have noted all disks are currently on the motherboard to rule out the LSI card, should I update firmware and then move the 2 offending disks back to the LSI card?

  • Community Expert
6 hours ago, GoudaK said:

should I update firmware and then move the 2 offending disks back to the LSI card?

Yes, to see if the problems stay with the disks.

  • Author

Well for some reason I come home to find all error logs stopped and parity build humming along nicely... I thought I stopped it but clearly didn't...

I will wait for parity rebuild to complete (5 hours left) update firmware and go from there.

 

Thanks for the help so far.

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.