Jump to content

Errors and drives disabling during rebuilds


hansolo77

Recommended Posts

Hey guys.. I know you want a full diagnostics, and I'll provide if it necessary (I'm posting this on my tablet and I don't know how to "download" and/or "upload" the zip from it, so I'll get it later).

 

Anyway, I recently decided I wanted to go up to 20tb drives on my server.  I've had it running swell on 12tb drives but I'd just like to have that extra cushion.  So I bought 2 WD drives for my parity, and got them installed/upgraded just fine. 

 

I then decided before starting the expensive task of upgrading the next 24 drives, I wanted to upgrade my "fan wall" I built in my case.  Rather than use the existing fan wall, with ridiculously loud fans, I replaced the 4x80's with 3x120's from Noctua.  They're quite and all, but the drive temps (now that I have it fully loaded) are sitting a little hot.  So I wanted get some fans that would move a little more air.  I went with 3x120's from Noctua again, but they're those black, industrial fans.  Wow what a difference, temps dropped 20c!  

 

Now I'm on to replacing the data drives.  I've gone through and precleared the first drive.  After it was done I swapped it into the array.  Then after restarting the array it started rebuilding.  Pretty normal.  However, after about 20 minutes I noticed the speed was still really slow, like 70kps, just barely above dialup speed.  I posted in Discord about it then realized one of my parity drives disabled itself.  I don't know what the problem was, the log indicated it just lost connection.  So I turned off the server, removed the cover and disconnected/reconnected all the cables and restarted.  Of course, now the parity needs to rebuild.  So I put the array drive on hold and begun rebuilding the parity.  Once it was done I started the new array drive again.  Then.. wouldn't you know, the same thing happened only this time with the 2nd parity drive.  This is driving me nuts.  I've got a sneaky suspicion that the problem is with the SATA cables.  See, the way these drives are mounted, they sit vertically with their "butts" in the air, mounted on the side of the power supply.  The problem with mounting this way is the plugs stick out higher than the top of the case and you can't close it up.  So I bought some right angle connectors hoping I could lower the plugs.  It worked, and was working fine for a long time with the 12tb drives.  But for some reason, now with the 20tb's, they just don't sit right.  They never sat flush anyway, because the power supply was just slightly taller than the right angles, so the plugs still sat kinda at an angle.  So, because I thoght that might be the problem (not providing a good SATA connection) I ordered some NEW replacement SATA cables.  These guys are nice.  Rather than have the wires come out the end of the plug, they come out the side.  Finally I have a cable where the connector is low enough for clearance!

 

So after a long, nearly a week long, lengthly process, I've got both parity drives rebuilt with good cables.  Time to start rebuilding the new drive in the array...again.  First I had to preclear it, again, because Unraid was yelling at me about the filesystem being unmountable.  So I precleared it, started the rebuild, and saw it chugging along at like 150-160mbs, where it SHOULD be.  I went to bed.

 

Woke up this morning, checked the status before going to work.  NOT GOOD!  :(  It's only like 3% complete, speed is down around 30mbs.  The logs are not promising.  They indicate Parity 2 is still on the verge of being disabled, but somehow it recovers.  Lots of "soft resets" failing then succeeding, etc.  I don't know what thep problem is.

 

Then it dawned on me while at work (thus my inclusion of it in the post)..  Could it be the drives are just not getting enough power?  Everything was working fine (except for the rebuilding of the parities from them getting disabled) until I replaced the fans with those faster ones.  They're still using the power points on the motherboard, and the settings to be "full on" hasn't changed.  But could they still be drawing more power and now the drives are starving for juice?  If it's not that, what else could it be?

 

As I said, I'll provide full diagnostics zip soon if somebody needs it to help figure out the problem.  In the mean while.. here is just the entire "system log" cut/pasted...

 

Mar 30 07:18:18 Kyber kernel: ata2: softreset failed (device not ready)
Mar 30 07:18:18 Kyber kernel: ata2: hard resetting link
Mar 30 07:18:22 Kyber kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 30 07:18:22 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:18:22 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:18:22 Kyber kernel: ata2.00: configured for UDMA/133
Mar 30 07:18:22 Kyber kernel: ata2: EH complete
Mar 30 07:30:10 Kyber kernel: ata2.00: exception Emask 0x10 SAct 0x100 SErr 0x90302 action 0xe frozen
Mar 30 07:30:10 Kyber kernel: ata2.00: irq_stat 0x00400000, PHY RDY changed
Mar 30 07:30:10 Kyber kernel: ata2: SError: { RecovComm UnrecovData Persist PHYRdyChg 10B8B }
Mar 30 07:30:10 Kyber kernel: ata2.00: failed command: READ FPDMA QUEUED
Mar 30 07:30:10 Kyber kernel: ata2.00: cmd 60/b8:40:88:49:f5/02:00:1a:03:00/40 tag 8 ncq dma 356352 in
Mar 30 07:30:10 Kyber kernel:         res 40/00:00:88:49:f5/00:00:1a:03:00/40 Emask 0x10 (ATA bus error)
Mar 30 07:30:10 Kyber kernel: ata2.00: status: { DRDY }
Mar 30 07:30:10 Kyber kernel: ata2: hard resetting link
Mar 30 07:30:12 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 07:30:12 Kyber kernel: ata2: SATA link down (SStatus 0 SControl 300)
Mar 30 07:30:12 Kyber kernel: ata2: hard resetting link
Mar 30 07:30:17 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 07:30:22 Kyber kernel: ata2: softreset failed (device not ready)
Mar 30 07:30:22 Kyber kernel: ata2: hard resetting link
Mar 30 07:30:26 Kyber kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 30 07:30:26 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:30:26 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:30:26 Kyber kernel: ata2.00: configured for UDMA/133
Mar 30 07:30:26 Kyber kernel: ata2: EH complete
Mar 30 07:50:54 Kyber kernel: ata2.00: exception Emask 0x10 SAct 0xc00000 SErr 0x90202 action 0xe frozen
Mar 30 07:50:54 Kyber kernel: ata2.00: irq_stat 0x00400000, PHY RDY changed
Mar 30 07:50:54 Kyber kernel: ata2: SError: { RecovComm Persist PHYRdyChg 10B8B }
Mar 30 07:50:54 Kyber kernel: ata2.00: failed command: READ FPDMA QUEUED
Mar 30 07:50:54 Kyber kernel: ata2.00: cmd 60/00:b0:68:cb:3d/04:00:2e:03:00/40 tag 22 ncq dma 524288 in
Mar 30 07:50:54 Kyber kernel:         res 40/00:00:68:cb:3d/00:00:2e:03:00/40 Emask 0x10 (ATA bus error)
Mar 30 07:50:54 Kyber kernel: ata2.00: status: { DRDY }
Mar 30 07:50:54 Kyber kernel: ata2.00: failed command: READ FPDMA QUEUED
Mar 30 07:50:54 Kyber kernel: ata2.00: cmd 60/28:b8:68:cf:3d/01:00:2e:03:00/40 tag 23 ncq dma 151552 in
Mar 30 07:50:54 Kyber kernel:         res 40/00:00:68:cb:3d/00:00:2e:03:00/40 Emask 0x10 (ATA bus error)
Mar 30 07:50:54 Kyber kernel: ata2.00: status: { DRDY }
Mar 30 07:50:54 Kyber kernel: ata2: hard resetting link
Mar 30 07:50:56 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 07:50:56 Kyber kernel: ata2: SATA link down (SStatus 0 SControl 300)
Mar 30 07:50:56 Kyber kernel: ata2: hard resetting link
Mar 30 07:51:01 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 07:51:06 Kyber kernel: ata2: softreset failed (device not ready)
Mar 30 07:51:06 Kyber kernel: ata2: hard resetting link
Mar 30 07:51:10 Kyber kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 30 07:51:10 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:51:10 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 07:51:10 Kyber kernel: ata2.00: configured for UDMA/133
Mar 30 07:51:10 Kyber kernel: ata2: EH complete
Mar 30 08:20:00 Kyber kernel: ata2.00: exception Emask 0x10 SAct 0x400 SErr 0x90302 action 0xe frozen
Mar 30 08:20:00 Kyber kernel: ata2.00: irq_stat 0x00400000, PHY RDY changed
Mar 30 08:20:00 Kyber kernel: ata2: SError: { RecovComm UnrecovData Persist PHYRdyChg 10B8B }
Mar 30 08:20:00 Kyber kernel: ata2.00: failed command: READ FPDMA QUEUED
Mar 30 08:20:00 Kyber kernel: ata2.00: cmd 60/a0:50:28:bb:4a/01:00:49:03:00/40 tag 10 ncq dma 212992 in
Mar 30 08:20:00 Kyber kernel:         res 40/00:00:28:bb:4a/00:00:49:03:00/40 Emask 0x10 (ATA bus error)
Mar 30 08:20:00 Kyber kernel: ata2.00: status: { DRDY }
Mar 30 08:20:00 Kyber kernel: ata2: hard resetting link
Mar 30 08:20:02 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 08:20:02 Kyber kernel: ata2: SATA link down (SStatus 0 SControl 300)
Mar 30 08:20:02 Kyber kernel: ata2: hard resetting link
Mar 30 08:20:08 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 08:20:09 Kyber webGUI: Successful login user root from 10.27.27.1
Mar 30 08:20:12 Kyber kernel: ata2: softreset failed (device not ready)
Mar 30 08:20:12 Kyber kernel: ata2: hard resetting link
Mar 30 08:20:17 Kyber kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 30 08:20:17 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 08:20:17 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 08:20:17 Kyber kernel: ata2.00: configured for UDMA/133
Mar 30 08:20:17 Kyber kernel: ata2: EH complete
Mar 30 08:22:51 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 08:22:51 Kyber kernel: ata2: SATA link down (SStatus 0 SControl 300)
Mar 30 08:22:56 Kyber kernel: ata2: found unknown device (class 0)
Mar 30 08:23:01 Kyber kernel: ata2: softreset failed (device not ready)
Mar 30 08:23:05 Kyber kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 30 08:23:05 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 08:23:05 Kyber kernel: ata2.00: supports DRM functions and may not be fully accessible
Mar 30 08:23:05 Kyber kernel: ata2.00: configured for UDMA/133
Mar 30 09:14:43 Kyber nginx: 2023/03/30 09:14:43 [crit] 9855#9855: *229325 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 212.102.40.218, server: 0.0.0.0:443
Mar 30 10:39:18 Kyber nginx: 2023/03/30 10:39:18 [crit] 9855#9855: *244850 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 139.59.177.49, server: 0.0.0.0:443
Mar 30 11:01:04 Kyber nginx: 2023/03/30 11:01:04 [crit] 9855#9855: *248883 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 46.101.164.72, server: 0.0.0.0:443
Mar 30 11:04:00 Kyber nginx: 2023/03/30 11:04:00 [crit] 9855#9855: *249431 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 46.101.164.72, server: 0.0.0.0:443
Mar 30 11:08:20 Kyber nginx: 2023/03/30 11:08:20 [crit] 9855#9855: *250237 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 46.101.164.72, server: 0.0.0.0:443
Mar 30 12:02:38 Kyber root: /var/lib/docker: 10.7 GiB (11497426944 bytes) trimmed on /dev/loop2
Mar 30 12:02:38 Kyber root: /mnt/disks/PLEX DATA: 381.6 GiB (409735974912 bytes) trimmed on /dev/nvme1n1p1
Mar 30 12:02:38 Kyber root: /mnt/cache: 1.7 TiB (1899060109312 bytes) trimmed on /dev/nvme0n1p1
Mar 30 15:36:29 Kyber nginx: 2023/03/30 15:36:29 [crit] 9855#9855: *300187 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 87.236.176.63, server: 0.0.0.0:443
Mar 30 15:45:46 Kyber nginx: 2023/03/30 15:45:46 [crit] 9855#9855: *301979 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 46.101.130.16, server: 0.0.0.0:443
Mar 30 17:17:02 Kyber nginx: 2023/03/30 17:17:02 [crit] 9855#9855: *318900 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 162.243.150.11, server: 0.0.0.0:443
Mar 30 17:45:52 Kyber webGUI: Successful login user root from 10.27.27.1
Mar 30 17:48:16 Kyber webGUI: Successful login user root from 10.27.27.1

 

Thanks again guys!  And if you're curious, here's the new SATA cables I'm using:

https://www.amazon.com/dp/B00KTLGDZG?psc=1&ref=ppx_yo2ov_dt_b_product_details

 

And here's the new fans:

https://www.amazon.com/dp/B00KFCRATC?psc=1&ref=ppx_yo2ov_dt_b_product_details

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...