New Cache SSD, new problems


Dmtalon

Recommended Posts

Last week I took advantage of the low price of the a Samsung 860 1TB SSD  so I could replace my quite old WD Black 1TB cache drive and an older 128GB Samsung SSD used as an apps drive (from SNAP days)

 

The cutover to the new SSD when very smooth and I was able to move my VM/Dockers off my app drive without any issue.  My end result was removing two drives and replacing them with the new SSD.

 

The problem is I keep getting errors at boot and  in the log every so often.  I *thought* it had something to do with NCQ (which is forced off) but it's still error hours later.  Below, notice the SATA link up at 1.5 Gbps.  Everything seems to work, I've used plex docker quite a bit and have a Windows VM running on this drive too. No user/noticeable issues

 

Also, I'm on SATA Cable #3.  The last I opened the case last night I replaced all 7 of them with brand new Monoprice 18" cables. 

 

Any help would be greatly appreciated.  Let me know if I should attach diagnostics.

 

Dec  3 15:19:36 NAS1 kernel: ata5.00: exception Emask 0x10 SAct 0x7fffefff SErr 0x0 action 0x6 frozen
Dec  3 15:19:36 NAS1 kernel: ata5.00: irq_stat 0x08000000, interface fatal error
~ <lots of entries>
Dec  3 15:19:36 NAS1 kernel: ata5.00: status: { DRDY }
Dec  3 15:19:36 NAS1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Dec  3 15:19:36 NAS1 kernel: ata5.00: cmd 61/10:e8:70:4d:1c/00:00:1d:00:00/40 tag 29 ncq dma 8192 out
Dec  3 15:19:36 NAS1 kernel:         res 40/00:68:70:46:1c/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
Dec  3 15:19:36 NAS1 kernel: ata5.00: status: { DRDY }
Dec  3 15:19:36 NAS1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Dec  3 15:19:36 NAS1 kernel: ata5.00: cmd 61/a8:f0:c0:4d:1c/00:00:1d:00:00/40 tag 30 ncq dma 86016 out
Dec  3 15:19:36 NAS1 kernel:         res 40/00:68:70:46:1c/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
Dec  3 15:19:36 NAS1 kernel: ata5.00: status: { DRDY }
Dec  3 15:19:36 NAS1 kernel: ata5: hard resetting link
Dec  3 15:19:36 NAS1 kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  3 15:19:36 NAS1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Dec  3 15:19:36 NAS1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Dec  3 15:19:36 NAS1 kernel: ata5.00: configured for UDMA/133
Dec  3 15:19:36 NAS1 kernel: ata5: EH complete
Dec  3 15:19:36 NAS1 kernel: ata5.00: Enabling discard_zeroes_data

 

Here's from earlier today. (notice the 6.0 Gbps)

 

Dec  2 20:45:33 NAS1 kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
Dec  2 20:45:33 NAS1 kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Dec  2 20:45:33 NAS1 kernel: ata9: SError: { Handshk }
Dec  2 20:45:33 NAS1 kernel: ata9.00: failed command: WRITE DMA EXT
Dec  2 20:45:33 NAS1 kernel: ata9.00: cmd 35/00:40:b8:2e:89/00:05:14:00:00/e0 tag 10 dma 688128 out
Dec  2 20:45:33 NAS1 kernel:         res 50/00:00:b7:2e:89/00:00:14:00:00/e0 Emask 0x10 (ATA bus error)
Dec  2 20:45:33 NAS1 kernel: ata9.00: status: { DRDY }
Dec  2 20:45:33 NAS1 kernel: ata9: hard resetting link
Dec  2 20:45:34 NAS1 kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  2 20:45:34 NAS1 kernel: ata9.00: configured for UDMA/133
Dec  2 20:45:34 NAS1 kernel: ata9: EH complete

No errors shown here, passes smart test

image.thumb.png.51042c0ae34952132fb3369eb246a0a3.png

 

 

 

During boot up things SEEM to come up ok. then ~23 seconds later it errors out, comes back up, errors out, comes up, gets limited to 3.0, errors out.  At one point over night it made it like 4 hours w/o erroring (no activity I guess)

Dec  2 20:10:22 NAS1 kernel: EDAC MC0: Giving out device to module amd64_edac controller F15h: DEV 0000:00:18.3 (INTERRUPT)
Dec  2 20:10:22 NAS1 kernel: EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.2 (POLLED)
Dec  2 20:10:22 NAS1 kernel: AMD64 EDAC driver v3.5.0
Dec  2 20:10:22 NAS1 kernel: ata7: SATA link down (SStatus 0 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec  2 20:10:22 NAS1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Dec  2 20:10:22 NAS1 kernel: ata9.00: ATA-9: WDC WD40EFRX-68WT0N0,      WD-WCC4E0ELT95A, 82.00A82, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata9.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata1.00: ATA-9: WDC WD20EZRX-00DC0B0,      WD-WCC1T0586104, 80.00A80, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata1.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata9.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata5.00: ATA-11: Samsung SSD 860 EVO 1TB, S3Z8NB0KB64216A, RVT02B6Q, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata5.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata1.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: scsi 1:0:0:0: Direct-Access     ATA      WDC WD20EZRX-00D 0A80 PQ: 0 ANSI: 5
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: [sdb] 4096-byte physical blocks
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Dec  2 20:10:22 NAS1 kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  2 20:10:22 NAS1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Dec  2 20:10:22 NAS1 kernel: ata4.00: ATA-8: WDC WD20EARS-00MVWB0,      WD-WMAZA3795145, 51.0AB51, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata3.00: ATA-8: WDC WD20EARS-00MVWB0,      WD-WMAZA3812777, 51.0AB51, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata3.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata2.00: ATA-8: WDC WD20EARS-00MVWB0,      WD-WMAZA3745610, 51.0AB51, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata2.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata5.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata6.00: ATA-8: WDC WD20EARX-00PASB0,      WD-WCAZAC344236, 51.0AB51, max UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata6.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Dec  2 20:10:22 NAS1 kernel: ata4.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata3.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: ata2.00: configured for UDMA/133
Dec  2 20:10:22 NAS1 kernel: scsi 2:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
Dec  2 20:10:22 NAS1 kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0
Dec  2 20:10:22 NAS1 kernel: sd 2:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec  2 20:10:22 NAS1 kernel: sd 2:0:0:0: [sdc] Write Protect is off
Dec  2 20:10:22 NAS1 kernel: sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Dec  2 20:10:22 NAS1 kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  2 20:10:22 NAS1 kernel: scsi 3:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
Dec  2 20:10:22 NAS1 kernel: sd 3:0:0:0: Attached scsi generic sg3 type 0
Dec  2 20:10:22 NAS1 kernel: sd 3:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec  2 20:10:22 NAS1 kernel: sd 3:0:0:0: [sdd] Write Protect is off
Dec  2 20:10:22 NAS1 kernel: sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Dec  2 20:10:22 NAS1 kernel: sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  2 20:10:22 NAS1 kernel: scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
Dec  2 20:10:22 NAS1 kernel: sd 4:0:0:0: [sde] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec  2 20:10:22 NAS1 kernel: sd 4:0:0:0: Attached scsi generic sg4 type 0
Dec  2 20:10:22 NAS1 kernel: sd 4:0:0:0: [sde] Write Protect is off
Dec  2 20:10:22 NAS1 kernel: sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
Dec  2 20:10:22 NAS1 kernel: sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  2 20:10:22 NAS1 kernel: scsi 5:0:0:0: Direct-Access     ATA      Samsung SSD 860  2B6Q PQ: 0 ANSI: 5
Dec  2 20:10:22 NAS1 kernel: ata5.00: Enabling discard_zeroes_data
Dec  2 20:10:22 NAS1 kernel: sd 5:0:0:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Dec  2 20:10:22 NAS1 kernel: sd 5:0:0:0: [sdf] Write Protect is off
Dec  2 20:10:22 NAS1 kernel: sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00
Dec  2 20:10:22 NAS1 kernel: sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  2 20:10:22 NAS1 kernel: sd 5:0:0:0: Attached scsi generic sg5 type 0
Dec  2 20:10:22 NAS1 kernel: ata5.00: Enabling discard_zeroes_data

 

 

Edited by Dmtalon
Link to comment

Firstly ATA9 is the parity disk, and since there also are a couple o CRC errors you should replace its SATA cable.

 

As for the SSD, and assuming you're not using some sort of enclosure, try connecting it to the Asmedia controller, where the parity disk is connected, if errors persist it's likely still a cable problem.

Link to comment
6 hours ago, johnnie.black said:

Firstly ATA9 is the parity disk, and since there also are a couple o CRC errors you should replace its SATA cable.

 

As for the SSD, and assuming you're not using some sort of enclosure, try connecting it to the Asmedia controller, where the parity disk is connected, if errors persist it's likely still a cable problem.

Ack, I didn't even notice I had ATA9 in there.  I was copying/pasting and managed to overlook that. 

 

The current connectors have fat heads and are pushing the release spring of the bottom cable. SO I swapped in two of the original ones for the bottom and replaced the SSD with a third brand cable and moved it to the ASMedia port that was open.  I think I got a clean boot!!

 

I also took this time to look up and see there was a newer BIOS for my MB, and updated that while I was at it.  Why not :)

 

The highest messages I got on this boot were yellow/orange warnings, no errors and not related to SATA.

 

Dec  4 05:12:32 NAS1 kernel: ACPI: Early table checksum verification disabled
Dec  4 05:12:32 NAS1 kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20170728/tbfadt-658)
Dec  4 05:12:32 NAS1 kernel: acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
Dec  4 05:12:32 NAS1 kernel: floppy0: no floppy controllers found
Dec  4 05:12:32 NAS1 kernel: random: 7 urandom warning(s) missed due to ratelimiting
Dec  4 05:12:33 NAS1 rpc.statd[1659]: Failed to read /var/lib/nfs/state: Success
Dec  4 05:12:37 NAS1 avahi-daemon[2809]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns!

 

AND... There it is again.  While typing up this message ATA1 (SSD) just barked again. <sigh>

 

Dec 4 10:18:53 NAS1 kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
Dec 4 10:18:53 NAS1 kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Dec 4 10:18:53 NAS1 kernel: ata1: SError: { Handshk }
Dec 4 10:18:53 NAS1 kernel: ata1.00: failed command: WRITE DMA EXT
Dec 4 10:18:53 NAS1 kernel: ata1.00: cmd 35/00:40:d0:a6:37/00:01:04:00:00/e0 tag 16 dma 163840 out
Dec 4 10:18:53 NAS1 kernel: res 50/00:00:cf:a6:37/00:00:04:00:00/e0 Emask 0x10 (ATA bus error)
Dec 4 10:18:53 NAS1 kernel: ata1.00: status: { DRDY }
Dec 4 10:18:53 NAS1 kernel: ata1: hard resetting link
Dec 4 10:18:53 NAS1 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 4 10:18:53 NAS1 kernel: ata1.00: configured for UDMA/133
Dec 4 10:18:53 NAS1 kernel: ata1: EH complete

 

Link to comment

I moved the SATA to the empty port on the Asmedia controller

32 minutes ago, Dmtalon said:

The current connectors have fat heads and are pushing the release spring of the bottom cable. SO I swapped in two of the original ones for the bottom and replaced the SSD with a third brand cable and moved it to the ASMedia port that was open.  I think I got a clean boot!!

 

Attached latest diagnostic

nas1-diagnostics-20181204-1052.zip

Edited by Dmtalon
Link to comment

OK, this is why I hate opening my unRAID case :)  It appears that this was just cabling.  I have again swapped a cable and double/triple checked everything was fully seated. Re-seated the HDD's, and booted up.  Been up for about 40 minutes w/o any SATA errors/issues.  Lets hope this continues.

 

Thanks for the insight/help.

 

 

 

 

Link to comment
48 minutes ago, Dmtalon said:

OK, this is why I hate opening my unRAID case :)  It appears that this was just cabling.  I have again swapped a cable and double/triple checked everything was fully seated. Re-seated the HDD's, and booted up.  Been up for about 40 minutes w/o any SATA errors/issues.  Lets hope this continues.

 

Thanks for the insight/help.

 

 

 

 

Hot swap bays FTW

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.