Jump to content

Array not starting (and other issues) after new PSU


Go to solution Solved by JorgeB,

Recommended Posts

I've had some mysterious issues with cache drives dying prematurely, and after getting brand new drives, new SATA cables, using new SATA ports, and swapping out basically everything except for the power supply of the server, I finally caved and got a new power supply.

 

I installed it yesterday, but when the server booted back up, 2 of the regular drives were in an error state. The server booted slowly, but it booted.

 

Today I tried to run XFS repair which did not work. Now I'm trying to rebuild the drives from parity, but I can no longer get the array to start. The poor machine is just all over the place and I'm not sure what to do with it at this point. Restarts aren't helping. Safe mode didn't do it. Reseating power cables and the bad drives didn't do it.

 

Diagnostics attached... any ideas? Thanks in advance!

greenplanet-diagnostics-20240610-1452.zip

Link to comment
  • Solution

There's a connected SATA device causing constant errors:

 

Jun 10 14:15:47 Greenplanet kernel: ata8: found unknown device (class 0)
Jun 10 14:15:47 Greenplanet kernel: apex 0000:05:00.0: Apex performance not throttled due to temperature
Jun 10 14:15:47 Greenplanet kernel: ata8: found unknown device (class 0)
Jun 10 14:15:47 Greenplanet kernel: ata8: SATA link down (SStatus 0 SControl 300)
Jun 10 14:15:47 Greenplanet kernel: ata8: found unknown device (class 0)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Jun 10 14:15:47 Greenplanet kernel: ata8: SATA link down (SStatus 0 SControl 300)
Jun 10 14:15:47 Greenplanet kernel: ata8: illegal qc_active transition (00000000->00000001)
Jun 10 14:15:47 Greenplanet kernel: ata8: limiting SATA link speed to 1.5 Gbps
Jun 10 14:15:47 Greenplanet kernel: ata8: found unknown device (class 0)
Jun 10 14:15:47 Greenplanet kernel: ata8: SATA link down (SStatus 0 SControl 310)
Jun 10 14:15:47 Greenplanet kernel: ata8: limiting SATA link speed to 1.5 Gbps
Jun 10 14:15:47 Greenplanet kernel: ata8: found unknown device (class 0)
Jun 10 14:15:47 Greenplanet kernel: ata8: SATA link down (SStatus 0 SControl 310)
Jun 10 14:15:47 Greenplanet kernel: ata8: limiting SATA link speed to 1.5 Gbps
Jun 10 14:15:47 Greenplanet kernel: sd 10:0:4:0: device_block, handle(0x000e)
Jun 10 14:15:47 Greenplanet kernel: sd 10:0:4:0: [sdf] tag#2151 UNKNOWN(0x2003) Result: hostbyte=0x0e driverbyte=DRIVER_OK cmd_age=17s
Jun 10 14:15:47 Greenplanet kernel: sd 10:0:4:0: [sdf] tag#2151 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Jun 10 14:15:47 Greenplanet kernel: I/O error, dev sdf, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jun 10 14:15:47 Greenplanet kernel: Buffer I/O error on dev sdf, logical block 0, async page read

 

Cannot see which device since it's not even being correctly identified, but look in the diags, see what devices are missing from SMART and disconnected them until you find it, then try replacing cables for that device.

Link to comment

I have a backplane there weren't really data cables to check (I mean technically there is a data cable but if that one's gone bad more than one drive should be misbehaving).

 

So, I thought maybe the new PSU wasn't able to provide enough power to the backplane over the one molex cable it came with. I found some SATA -> molex adapters and used them to distribute the backplane power load across 3 PSU cables instead, which worked - server is booting, array is starting (at a reasonable speed), and the drives stopped behaving erratically.

 

XFS repair still failed, so parity rebuild is in progress now. But I think that'll solve this one - thanks!

  • Like 1
Link to comment

I was running from GUI. Tried it again today and XFS repair worked this time. Not really sure what the difference was but hey, at least it worked.

 

Unfortunately I now have 11TB in lost+found to sort through but at least the system appears to be stable 😅

 

I'm hoping everything that's missing right now is in lost+found, it's hard to tell when it's this big

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...