[SOLVED] Lots of ATA errors in logs

syst1k · March 19, 2021

Hi,

I'm a n00b trying to setup unraid on a a couple of disks I have (8TB WD & 4TB Seagate). I connected my hard disks, and then started the parity building process. Note, this is a completely fresh install. When I check dmesg, I see a ton of logs related to SATA.

Mar 18 21:41:21 nas kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 18 21:41:21 nas kernel: ata5.00: cmd 60/40:b8:f8:3b:e8/05:00:13:00:00/40 tag 23 ncq dma 688128 in
Mar 18 21:41:21 nas kernel:         res 40/00:d8:38:41:e8/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Mar 18 21:41:21 nas kernel: ata5.00: status: { DRDY }
Mar 18 21:41:21 nas kernel: ata5: hard resetting link
Mar 18 21:41:31 nas kernel: ata5: softreset failed (1st FIS failed)
Mar 18 21:41:31 nas kernel: ata5: hard resetting link
Mar 18 21:41:32 nas kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Does anyone know what it means? I have attached the system diagnostics as well.

I was having a lot of trouble getting the setup up and running, had to replace 2 drives already. So I'm not sure if its the drive, cabling, the controller or just the drivers.

I've replaced cables, drives and even the controller already.

As an update - I got notification that my 4TB data drive has tons of read errors. This is a brand new drive that I just got today. I'm really not sure if the drive is bad, or something else in my setup.

oceans-1125-diagnostics-20210318-2141.zip

Edited March 19, 2021 by syst1k
Adding more information.

Vr2Io · March 19, 2021

The add-on JMB585 ( In M.2. slot ? ) which connect two disk have link error.

WD in disable and Seagate generate mass error.

syst1k · March 19, 2021

2 minutes ago, Vr2Io said:

The add-on JMB585 ( In M.2. slot ? ) which connect two disk have link error.

Yes! I replaced that entire controller. I am using this: https://www.amazon.com/Internal-Non-Raid-Adapter-Desktop-Support/dp/B07T3RMFFT/

I am also using this with an intel nuc7i3 if that matters. Where do you see this error? In dmesg?

Vr2Io · March 19, 2021

2 minutes ago, syst1k said:

In dmesg?

In syslog.

I have JMB585 M2 too, but never try or use, even insert in motherboard.

Pls try disable NCQ in "disk settings" then reboot.

image.png.0dc9018412e10eb30b88f181ed78ed5d.png

Does other JMB585 user could provide any different on below message ?

Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: SSS flag set, parallel bus scan disabled
Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: AHCI 0001.0301 32 slots 5 ports 6 Gbps 0x1f impl SATA mode
Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: flags: 64bit ncq sntf stag pm led clo pmp fbs pio slum part ccc apst boh

Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio slum part deso sadm sds apst

syst1k · March 19, 2021

I disabled NCQ, and rebooted. Here's the updated diagnostic.

I still see those errors both in dmesg and syslog

oceans-1125-diagnostics-20210318-2236.zip

Vr2Io · March 19, 2021

Pls also check does BIOS have ASPM turn on, if yes, disable it.

Further, need other user advice.

Edited March 19, 2021 by Vr2Io

syst1k · March 19, 2021

9 minutes ago, Vr2Io said:

Pls also check does BIOS have ASPM turn on, if yes, disable it.

Yes it was enabled, I have disabled it and rebooted. I don't think it helped, I can still see the errors.

oceans-1125-diagnostics-20210318-2253.zip

syst1k · March 19, 2021

If its the controller thats bad, is there any recommended M.2 -> sata controller I should be using? esp that works with unraid.

Vr2Io · March 19, 2021

Limited choice, senior user @JorgeB or others have solid experience on those controller without problem.

If you willing or have time, could you try that M2 controller in other computer which have M2 slot.

Edited March 19, 2021 by Vr2Io

syst1k · March 19, 2021

10 minutes ago, Vr2Io said:

Limited choice, senior user @JorgeB or others have solid experience on those controller without problem.

If you willing or have time, could you try that M2 controller in other computer which have M2 slot.

I found another discussion about this:

Trying to figure out where to update the powertop rule. Maybe its related?

Vr2Io · March 19, 2021

3 minutes ago, syst1k said:

I found another discussion about this:

Trying to figure out where to update the powertop rule. Maybe its related?

May be, motherboard BIOS usually have this feature on/off control, but I am not sure this setting will control the add-on too. Waiting mgutt advice.

Edited March 19, 2021 by Vr2Io

syst1k · March 19, 2021

FWIW, I installed openmediavault on a flash drive and booted the same configuration up. I don't see any link errors in dmesg on boot at least. I haven't tried the raid setup. Only cleaned the two drives, and it seemed to succeed. Wondering if its a card issue + unraid instead?

Vr2Io · March 19, 2021

Case similar to Marvell controller, even same chips model, some would work solid but some just throwing error. ( link drop )

Longtime ago, I use addon Marvell controller and got link drop ( Windows also ) , since then I change to use LSI HBA until now without issue. Of course, not all user trouble free.

Edited March 19, 2021 by Vr2Io

JorgeB · March 19, 2021

JMB585 usually work fine, I have several myself, could be a problem with that specific controller.

syst1k · March 19, 2021

8 minutes ago, JorgeB said:

JMB585 usually work fine, I have several myself, could be a problem with that specific controller.

You mean the card? I got a replacement and installed it today. I was having the same issue with the one I had previously, hence got the replacement.

mgutt · March 19, 2021

4 hours ago, syst1k said:

Trying to figure out where to update the powertop rule. Maybe its related?

Are you using powertop or are you using a rule like this?

echo 'med_power_with_dipm' > /sys/class/scsi_host/host10/link_power_management_policy

Since I'm not using this rule anymore, the problem disappeared. Another user in this forum uses the same M.2 adapter from IOCrest without problems.

But it seems your problem is different as I never had ATA bus errors.

Maybe its worth to try it again with a PCIe x16 to M.2 adapter (~15 €) and check if the same errors occur in a usual PCIe slot.

I compared your logs:

Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: [197b:0585] type 00 class 0x010601
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x10: [io  0xe200-0xe27f]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x14: [io  0xe180-0xe1ff]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x18: [io  0xe100-0xe17f]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x1c: [io  0xe080-0xe0ff]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x20: [io  0xe000-0xe07f]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x24: [mem 0xdc010000-0xdc011fff]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: reg 0x30: [mem 0xdc000000-0xdc00ffff pref]
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: PME# supported from D3hot
...
Mar 18 21:15:18 Oceans-1125 kernel: pci 0000:3c:00.0: Adding to iommu group 13
...
Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: SSS flag set, parallel bus scan disabled
Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: AHCI 0001.0301 32 slots 5 ports 6 Gbps 0x1f impl SATA mode
Mar 18 21:15:18 Oceans-1125 kernel: ahci 0000:3c:00.0: flags: 64bit ncq sntf stag pm led clo pmp fbs pio slum part ccc apst boh

to mine:

Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: [197b:0585] type 00 class 0x010601
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x10: [io  0x3200-0x327f]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x14: [io  0x3180-0x31ff]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x18: [io  0x3100-0x317f]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x1c: [io  0x3080-0x30ff]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x20: [io  0x3000-0x307f]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x24: [mem 0xa1810000-0xa1811fff]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: reg 0x30: [mem 0xa1800000-0xa180ffff pref]
Mar 18 10:37:14 Thoth kernel: pci 0000:05:00.0: PME# supported from D3hot
...
Mar 18 10:37:14 Thoth kernel: iommu: Adding device 0000:05:00.0 to group 14
...
Mar 18 10:37:14 Thoth kernel: ahci 0000:05:00.0: SSS flag set, parallel bus scan disabled
Mar 18 10:37:14 Thoth kernel: ahci 0000:05:00.0: AHCI 0001.0301 32 slots 5 ports 6 Gbps 0x1f impl SATA mode
Mar 18 10:37:14 Thoth kernel: ahci 0000:05:00.0: flags: 64bit ncq sntf stag pm led clo pmp fbs pio slum part ccc apst boh 
Mar 18 10:37:14 Thoth kernel: ahci 0000:05:00.0: both AHCI_HFLAG_MULTI_MSI flag set and custom irq handler implemented

I don't know what the reason of the last line could be, which is missing in your log. I'm still on Unraid 6.8.3. Maybe worth a test?!

And compare your first disk on the m.2 adapter:

Mar 18 21:15:18 Oceans-1125 kernel: ata4: SATA max UDMA/133 abar m8192@0xdc010000 port 0xdc010200 irq 130
...
Mar 18 21:15:18 Oceans-1125 kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 18 21:15:18 Oceans-1125 kernel: ata4.00: ATA-9: WDC WD8003FFBX-68B9AN0, VGKA1DBG, 83.00A83, max UDMA/133
Mar 18 21:15:18 Oceans-1125 kernel: ata4.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 32), AA
Mar 18 21:15:18 Oceans-1125 kernel: ata4.00: configured for UDMA/133
Mar 18 21:15:18 Oceans-1125 kernel: scsi 4:0:0:0: Direct-Access     ATA      WDC WD8003FFBX-6 0A83 PQ: 0 ANSI: 5
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: Attached scsi generic sg2 type 0
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] 4096-byte physical blocks
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] Write Protect is off
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 18 21:15:18 Oceans-1125 kernel: sdc: sdc1 sdc2
Mar 18 21:15:18 Oceans-1125 kernel: sd 4:0:0:0: [sdc] Attached SCSI disk

with mine:

Mar 18 10:37:14 Thoth kernel: ata11: SATA max UDMA/133 abar m8192@0xa1810000 port 0xa1810200 irq 136
...
Mar 18 10:37:14 Thoth kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 18 10:37:14 Thoth kernel: ata11.00: ATA-11: WDC  WDS100T2B0A-00SM50, 1905AB802119, 401000WD, max UDMA/133
Mar 18 10:37:14 Thoth kernel: ata11.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
Mar 18 10:37:14 Thoth kernel: ata11.00: configured for UDMA/133
Mar 18 10:37:14 Thoth kernel: scsi 11:0:0:0: Direct-Access     ATA      WDC  WDS100T2B0A 00WD PQ: 0 ANSI: 5
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: [sdj] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: [sdj] Write Protect is off
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: [sdj] Mode Sense: 00 3a 00 00
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: Attached scsi generic sg9 type 0
Mar 18 10:37:14 Thoth kernel: sdj: sdj1
Mar 18 10:37:14 Thoth kernel: sd 11:0:0:0: [sdj] Attached SCSI disk

Looks good to me.

But something is strange. All your errors happen only with ata5 (sdd):

Mar 18 21:15:18 Oceans-1125 kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 18 21:15:18 Oceans-1125 kernel: ata5.00: ATA-10: ST4000VN008-2DR166,             ZGY8XT7A, SC60, max UDMA/133
Mar 18 21:15:18 Oceans-1125 kernel: ata5.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
Mar 18 21:15:18 Oceans-1125 kernel: ata5.00: configured for UDMA/133
...
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: exception Emask 0x10 SAct 0xffffffff SErr 0x9b0000 action 0xe frozen
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: irq_stat 0x00400000, PHY RDY changed
Mar 18 21:17:53 Oceans-1125 kernel: ata5: SError: { PHYRdyChg PHYInt 10B8B Dispar LinkSeq }
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: cmd 60/88:00:f0:cc:7f/00:00:00:00:00/40 tag 0 ncq dma 69632 in
Mar 18 21:17:53 Oceans-1125 kernel:         res 40/00:80:e8:e3:7f/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: status: { DRDY }
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: failed command: READ FPDMA QUEUED
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: cmd 60/10:08:d0:d9:7f/01:00:00:00:00/40 tag 1 ncq dma 139264 in
Mar 18 21:17:53 Oceans-1125 kernel:         res 40/00:80:e8:e3:7f/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: status: { DRDY }
Mar 18 21:17:53 Oceans-1125 kernel: ata5.00: failed command: READ FPDMA QUEUED

They do not happen with ata4 (sdc), which is your parity disk (loads.txt):

sda (flash)=0 0 815 139
sdb (cache)=0 0 424 352
sdc (parity)=0 142721024 675 731157
sdd (disk1)=142606336 0 731957 3274

Maybe something related to the sata port on the adapter?! Beware: The adapter is thin and can be damaged easily by removing / attaching sata cables.

3 hours ago, syst1k said:

I installed openmediavault on a flash drive and booted the same configuration up. I don't see any link errors in dmesg on boot at least. I haven't tried the raid setup. Only cleaned the two drives, and it seemed to succeed.

The errors appeared after you started your Unraid array. Did you really use the disks in OMV or boot only?

JorgeB · March 19, 2021

2 hours ago, syst1k said:

You mean the card?

Yes, a bad one or a bad model, JMB chip works fine.

syst1k · March 19, 2021

4 hours ago, mgutt said:

The errors appeared after you started your Unraid array. Did you really use the disks in OMV or boot only?

I booted up and just "erased" the disks from the disks section. Didn't do any more rw operation on the disks.

And I realized it later, but I don't use power top, so it must be another issue.

Edited March 19, 2021 by syst1k

syst1k · March 19, 2021

7 hours ago, mgutt said:

Maybe something related to the sata port on the adapter?! Beware: The adapter is thin and can be damaged easily by removing / attaching sata cables.

I also switched the sata port over to ATA6. this is for my data drive. I get the errors for ATA6 now. I don't think its the port.

Attaching the new logs.

oceans-1125-diagnostics-20210319-1213.zip

Vr2Io · March 20, 2021

If no progress on JMB585, would you try another type controller i.e. Asmedia. I trying this controller in longtime ago haven't problem.

mgutt · March 20, 2021

3 hours ago, Vr2Io said:

would you try another type controller i.e. Asmedia

Only the JMB M.2 adapter has 5 sata ports.

Vr2Io · March 20, 2021

3 hours ago, mgutt said:

Only the JMB M.2 adapter has 5 sata ports.

Of course, but OP just use two port.

18 hours ago, syst1k said:

but I don't use power top, so it must be another issue.

You may got some hints on such case.

BTW, @syst1k pls provide below output for ref.

cat /sys/module/pcie_aspm/parameters/policy

cat /sys/class/scsi_host/host*/link_power_management_policy

syst1k · March 20, 2021

11 hours ago, Vr2Io said:

Of course, but OP just use two port.

I was planning on using 3 ports at least - 2 data and 1 parity.

11 hours ago, Vr2Io said:

cat /sys/module/pcie_aspm/parameters/policy

cat /sys/class/scsi_host/host*/link_power_management_policy

default

max_performance

I also got a new card - https://www.amazon.com/Sedna-PCIe-Adapter-Support-Software/dp/B07T46VSRS/ This is also a JMB chip, but thought I'd try it. Its the same issue. Attaching logs here, in case I missed something others can see.

The logs are after I started the Array Building operation and parity was being computed. I still see the progress in the unraid UI, but see these errors in the logs/dmesg.

oceans-1125-diagnostics-20210320-1428.zip

Edited March 20, 2021 by syst1k
adding more info

syst1k · March 20, 2021

Also found some more discussion about it:

Is this error something I can ignore if the read/write goes through? I'm a little confused.

syst1k · March 20, 2021

Another thing I'd like to add, since the topic of power keeps coming up. I am currently using an external case (https://www.amazon.com/Kingwin-Enclosure-Internal-Backplane-Optimized/dp/B01BMJ1WD6/) to house my HDDs (the WD and ST), whereas the cache drive (SSD) is connected to the onboard sata port.

I'm using an intel NUC7i3BNH - specs here: https://ark.intel.com/content/www/us/en/ark/products/95066/intel-nuc-kit-nuc7i3bnh.html

To power these drives, I'm using an AC -> molex adaptor (12V/5V) and then I use a sata power splitter because the kingwin enclosure needs two sata 15pin power.

Does any of the above sound suspicious that could lead to this? To eliminate the kingwin enclosure as the source of the problem, I connected the drives directly to the JMB card and then powered the drives using my sata power splitter directly. I still got the same issue, hence I ruled out the external enclosure to be the source of the problem. However, power is something I haven't switched. I did use the same way to power another 3.5inch disk and connect to my computer via USB (to copy some data - I have a sata to USB cable), and it worked fine. So I didn't think the power supply can be a problem. However, I didn't power 2 drives together.

Just trying to think through what could be the root cause. If the JMB chip is known to work with unraid, then it could be my motherboard (nuc) + this card? Which is weird, because I did read success stories of people using the same card (iocrest) with NUC and FreeNAS/Windows. Which brings me to the point of what it means when we say "it works". Can I ignore the errors if here are no read/write errors while building the parity and data copying? Or am I limited in speed to the link resest and I cannot rely on the setup?

[SOLVED] Lots of ATA errors in logs

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation