Jump to content

Drive issues (maybe sata cables?)


Recommended Posts

I've been having some weird issues with my drives. As far as I can tell based on SMART reports and the fact that the drives can be connected to and can be read from, the drives appear to be healthy and not actually failing. However, they seem to randomly spam disk errors. From reading around this seems to be related to the sata cables and appears to behave that way as different drives will have the issue if I swap out a bunch of cables or switch them around however it seems that no matter how many cables I swap I still end up with 1 or 2 drives with the issue. I believe this also leads to the parity rebuild being extremely slow (i.e. 143kb/s). This is all assumptions based on some trial and error however I've been dealing with this issue for a week. I've bought a sata controller to confirm it wasn't the motherboard ports being weird, I've bought 3 new packs of sata cables from different stores and have even bought an entirely new drive to swap and attempt a rebuild to no avail. I've run out of ideas of how to fix or and haven't had any luck in actually narrowing down the issue and hopefully someone here can help point me in the right direction. I've uploaded the latest diagnostics

 

At the current time, Parity Disk #2 (2AA102_ZA24B6TK) & Disk #1 (2AA101_ZA23BJEJ) being the two spamming errors. While Disk #2 (2AA101_ZA28C460) currently requires a rebuild as it was previously one of the drives spamming these same errors until it became disabled.

 

Disk #2 was originally (2H7100_ZHZ3RXFX) but I swapped it with the new drive to see if that would fix anything (which I of course precleared the new drive)

vortex-diagnostics-20220415-1421.zip

Link to comment

Hello, 

Yes this is one of the things I considered. It's currently on a 500w PSU with an average usage of about 170w according to the connected UPS. Up until today there were no power splitters though I did connect one to preclear the new disk. All disks in the array are connected directly to the psu cable and not to the splitter. 3 power cables for sata, 2 ports on each. So 6 drives total connected to the psu. As stated above the splitter is currently used to power 1 drive outside of the array but this issue persistent long before that. 

 

And it is connected to a UPS to prevent unclean shutdowns but this should also mean the psu is getting "clean" / steady power as well

Edited by MrCakeSlayer
Link to comment
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11)
	Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
	Kernel driver in use: ahci
	Kernel modules: ahci

Marvell controllers are NOT recommended. Probably you have enough ports without using that.

Link to comment

Be that as it may, the controller isn't causing the issue. All 4 drives plugged into the 4 port controller are working perfectly. It's the ones plugged into the motherboard having problems. 

 

These issues were also occurring before I even installed the sata card. 

 

Also if you have a reccomendation for another card, I don't mind buying a different one at a later point however I'm more concerned with the root issue erroring the mobo connected drives atm. 

Edited by MrCakeSlayer
Link to comment

I don't know off the top of my head, however if it's not in the logs I can check when I get off work. 

Here's the original pcpartpicker list for the components including the motherboard:

https://pcpartpicker.com/user/MrCakeSlayer/saved/fN72Vn

 

The ssd is used as cache drive though I've installed a newer 500gb that I'm active using. 

Edited by MrCakeSlayer
Link to comment
2 hours ago, ChatNoir said:

 

Thanks I appreciate the guide to other controllers. Though I was wondering if you guys had any insight into the errors I'm having or should I just throw a second sata card in there and move everything off the mobo sata ports? 

I'd prefer to just solve the root issue if possible though so I cna just get it back up and running tonight. 

 

I also appreciate all of the quick answers. :)

Link to comment

Updated pcpartpicker, it's now public.

 

Yes, I would have enough ports on the motherboard. However as previously stated the issue is occurring with HDDs attached directly to the motherboard. All HDDs on the controller are working perfectly.

This issue was also occurring BEFORE the sata card was ever installed.

Link to comment

Pls try disable all ASPM in mobo BIOS, once you got below error in log,  it means problem haven't solve.

 

Apr 15 14:19:34 Vortex kernel: ata9.00: exception Emask 0x10 SAct 0x30dc0000 SErr 0x90202 action 0xe frozen
Apr 15 14:19:34 Vortex kernel: ata9.00: irq_stat 0x00400000, PHY RDY changed
Apr 15 14:19:34 Vortex kernel: ata9: SError: { RecovComm Persist PHYRdyChg 10B8B }
Apr 15 14:19:34 Vortex kernel: ata9.00: failed command: READ FPDMA QUEUED
Apr 15 14:19:34 Vortex kernel: ata9.00: cmd 60/a8:90:08:00:00/00:00:00:00:00/40 tag 18 ncq dma 688128 in
Apr 15 14:19:34 Vortex kernel:         res 40/00:e8:00:04:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 15 14:19:34 Vortex kernel: ata9.00: status: { DRDY }
Apr 15 14:19:34 Vortex kernel: ata9.00: failed command: READ FPDMA QUEUED
Apr 15 14:19:34 Vortex kernel: ata9.00: cmd 60/58:98:b0:00:00/00:00:00:00:00/40 tag 19 ncq dma 360448 in
Apr 15 14:19:34 Vortex kernel:         res 40/00:e8:00:04:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 15 14:19:34 Vortex kernel: ata9.00: status: { DRDY }
Apr 15 14:19:34 Vortex kernel: ata9.00: failed command: READ FPDMA QUEUED
Apr 15 14:19:34 Vortex kernel: ata9.00: cmd 60/a8:a0:08:01:00/00:00:00:00:00/40 tag 20 ncq dma 688128 in
Apr 15 14:19:34 Vortex kernel:         res 40/00:e8:00:04:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Apr 15 14:19:34 Vortex kernel: ata9.00: status: { DRDY }

 

 

16 hours ago, MrCakeSlayer said:

At the current time, Parity Disk #2 (2AA102_ZA24B6TK) & Disk #1 (2AA101_ZA23BJEJ) being the two spamming errors. While Disk #2 (2AA101_ZA28C460) currently requires a rebuild as it was previously one of the drives spamming these same errors until it became disabled.

 

Those problem disk have negotiate low connection rate ( indicate something wrong between interface ), pls try swap some disk which firmware not in NN05/SN05 to onboard and check does error gone to rule out the problem.

 

Serial Number:    ZA24B6TK
Firmware Version: NN05
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

 

Serial Number:    ZA23BJEJ
Firmware Version: SN05
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)

 

Serial Number:    ZA28C460
Firmware Version: SN05
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

 

You can click disk icon to check individual disk log

 

image.png.d0a6eabef8275baf5b3e4e16f7063c7c.png 

Edited by Vr2Io
Link to comment

Hello,

I was unable to find any ASPM settings in the BIOS for this motherboard. I'm also unable to swap around any more disks. I currently have 2 emulated data disks and the only disk throwing the drive errors currently is a parity disk. I've uploaded a new diagnostics file while the array is running in maintenance mode. From what I can tell, I need to do a rebuild to get the current drives running however the rebuild runs at like 30kb's while the parity disk spams errors if I attempt to. As it stands with the current configuration all disks are working properly without spamming the previous errors except for parity disk 2.

image.png.c8272fd91e9a6bbf46317af6b9c246c7.png

vortex-diagnostics-20220416-1055.zip

Link to comment

I'm not quite sure what your suggesting I do. However I currently only have a single disk that isn't in the array and unless I'm mistaken I currently couldn't swap the parity disk with another drive anyways since I have 2 emulated data disks. So what is the best path/steps forward to resolve this without data loss? 

 

I do currently have another sata controller on the way that I purchased from the list reccomended above and will be moving all sata cables/hdds off the motherboard as well. Should I go ahead and purchase another drive? I'd rather not due to how expensive the 10tb drives are but I can if that's the only solution. 

 

Its also worth noting that the currently out-of-array JEJ drive was having the same issues as the current parity #2 drive which is why I swapped it out of the array with the brand new disk. 

Edited by MrCakeSlayer
Link to comment
13 hours ago, MrCakeSlayer said:

I'm also unable to swap around any more disks.

I mean swap disks connection between mobo and add-on controller. There are no reason to perform write operation i.e. rebuild if problem still on going.

 

 

12 hours ago, MrCakeSlayer said:

I do currently have another sata controller on the way that I purchased from the list reccomended above

This fine if you stop troubleshoot mobo issue, connect all disk to add-on controller and rebuild.

Edited by Vr2Io
Link to comment
On 4/16/2022 at 11:50 AM, MrCakeSlayer said:

I currently couldn't swap the parity disk with another drive anyways since I have 2 emulated data disks

Current contents of both parity disks are required just as they are to rebuild the 2 emulated data disks. It is the emulated data disks that need rebuilding, preferably to new disks so the original data disks are not affected in case your hardware problems affect rebuild results. You don't want to overwrite the existing data disks with unsuccessful or corrupt rebuild.

Link to comment
On 4/16/2022 at 11:50 AM, MrCakeSlayer said:

currently only have a single disk that isn't in the array

You can rebuild one of the emulated disks with that one if it is the one in your screenshot (10TB) and it has nothing on it you want to keep. That way you will be able to test whether your hardware problems are resolved. Then you can decide what to do about the other emulated disk.

Link to comment
On 4/18/2022 at 10:33 AM, trurl said:

You can rebuild one of the emulated disks with that one if it is the one in your screenshot (10TB) and it has nothing on it you want to keep. That way you will be able to test whether your hardware problems are resolved. Then you can decide what to do about the other emulated disk.

The problem is too many disks are having connection issues to rebuild anything. I bought a JMB585 controller from the recommended list linked above. It's even more broken, almost never gets detected when booting or only shows 1 or 2 disks connected to it when booting (only once did it show all 5), and the detected disks random changes every boot/restart.

 

The marvel and motherboard controllers detect all the disks and allows booting. So I'm returning the JMB controller. I currently have 4 disks back on the other sata card and 2 disks plugged into the mother board so it actually detects all the disks and boots.

 

I've attached the most recent diagnostics. Not that it's much help, the drives spamming errors changes randomly on each restart anyways though parity 2 disk (6TK) and the disk 4 (NCM) disk seem to be the most consistent with it.

 

Basically I'm wondering what do I actually need to do to get unraid to actually come back online and work again? I've had literally nothing but issues for the past month that I've been using unraid and had zero issues with the 2 years prior while running windows. Surely all of my drives didn't decide to just all die at the same time and I've swapped out sata cables from multiple brands like 6 times at this point plus the brand new drive also had the same issue with unraid.

 

How can I actually narrow down what specifically is causing the issue and fix that? The data is just movies/TV shows so it's non-critical but it'd be a pain to redownload everything.

vortex-diagnostics-20220420-2230.zip

Edited by MrCakeSlayer
Link to comment
9 hours ago, JorgeB said:

You should upgrade the LSI firmware since all p20 releases except latest one (20.00.07.00) have known issues, disk itself looks healthy, could also be a power issue.

 

Note that you'll need to do a new config to force enable 1 or more disks, Unraid can't emulate 3 disks, after updating the LSI post back if you need help with that.

 

 

I've gone ahead and updated the Firmware to 20.00.07.00, should I also upgrade the BIOS or should that be fine? Also while I have a decent idea of how I should do the config stuff, would you mind providing more explicit steps as I'd really prefer not to lose more data than needed.

 

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS2308_2(D1)   20.00.07.00    14.01.00.06    07.39.00.00     00:0a:00:00

Edited by MrCakeSlayer
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...