Seagate drives not spinning up on 6.9 - Going into Disabled State


Recommended Posts

Hi guys,

 

I've been having issues on RC and now 6.9.

The issue seems to be that drives are not spinning up and this is causing them to go into disabled state if I reboot.

As a work around I was asked to spin up before I restart. 

I just tried to spin up now and this is what I see:

image.png.b2e346332ce18e60909a9b2bf83b2e75.png

 

WD drives are spinning up just fine.

If I click on a Seagate drive I see: 

 

Mar 2 11:42:49 Odin kernel: mdcmd (15): import 14 sdu 64 7814026532 0 ST8000VN0022-2EL112_ZA10D0AW
Mar 2 11:42:49 Odin kernel: md: import disk14: (sdu) ST8000VN0022-2EL112_ZA10D0AW size: 7814026532
Mar 2 11:42:50 Odin emhttpd: read SMART /dev/sdu
Mar 3 06:04:55 Odin emhttpd: spinning down /dev/sdu
Mar 3 10:01:15 Odin emhttpd: spinning up /dev/sdu
Mar 3 10:01:33 Odin kernel: sd 1:0:19:0: [sdu] tag#3713 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00
 

If I click on the white dots to spin up I get this:

image.png.0afe77c2a6980e053a0e3c585c959e81.png

Disk 15 will not spin up at all. 

Yesterday it was Disk 14, and I had to remove it from the array and then re-add it back which caused a parity sync. 

I really dont want to have to do this everytime I want to restart.

 

 

Mar 3 10:01:30 Odin kernel: sd 1:0:20:0: [sdv] tag#3716 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00
Mar 3 10:01:33 Odin kernel: sd 1:0:20:0: [sdv] Synchronizing SCSI cache
Mar 3 10:01:33 Odin kernel: sd 1:0:20:0: [sdv] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Mar 3 10:04:33 Odin emhttpd: spinning up /dev/sdv
Mar 3 10:04:42 Odin emhttpd: spinning up /dev/sdv
Mar 3 10:04:46 Odin emhttpd: spinning up /dev/sdv

 

Yesterday Drive 14 was the one, I ran a full smart test and no issue.

Before this in RC I ran multiple full pre-clears and it too showed nothing wrong with the drives.

Its ONLY affecting the Seagate drives so far. WD drives have all been totally fine. 

I'm wondering if there is some upgraded firmware for Seagate, no idea, never flashed a HDD lol.
 

My hardware is a LSI SAS3008 9300-8I

Supermicro X10SRi-F

Supermicro BPN-SAS3-846EL1

 

These are my previous posts from RC:

 

Attached are my diags before restore.

odin-diagnostics-20210303-1010.zip

Edited by SavellM
Link to comment
  • SavellM changed the title to Seagate drives not spinning up on 6.9 - Going into Disabled State

Could I have you try the following the next time you see this happen, prior to reseating?
 

First find the host # for your scsi bus host # for the device:
root@sisyphus:~# ls -d /sys/class/scsi_host/host*
...
lrwxrwxrwx 1 root root 0 Mar  8 06:04 /sys/class/scsi_host/host6 -> ../../devices/pci0000:00/0000:00:11.5/ata6/host6/scsi_host/host6/
lrwxrwxrwx 1 root root 0 Mar  8 06:04 /sys/class/scsi_host/host5 -> ../../devices/pci0000:00/0000:00:11.5/ata5/host5/scsi_host/host5/
...

Then rescan, specifying the host number for the device per the output of the earlier command:
echo "- - -" > /sys/class/scsi_host/host5/scan

To limit the scan to only a single disk, specify it in place of the '-' marks

 

 

The above is kind of generic, but specific to UnRAID, you'd replace the '- - -' with the ID you find for the drive in tools->system devices if I'm remembering properly. Technically, there shouldn't be an issue running it directly as:

echo "- - -" > /sys/class/ (etc)

 

With any relatively modern hardware... But as a precaution, I'd recommend first stopping the array just in case. There used to be some issues with scanning an already active device, and while they were sorted out ages ago in code/fw, a little caution never hurt anyone.

 

The 'why' of trying the above - ironwolf drives seem to take an eternity (well, in computer terms anyway lol) to wake up from sleep/standby, especially as they age. The host is basically saying 'I give up, been waiting on you to get out of the bathroom for hours' and walking away, but doesn't go back to re-check down the line to see if the command ever actually did complete (unless you tell it to do so, which we're doing here).

 

If that give bupkis, try manually waking up the drive:

hddtemp -w /dev/sdu

 

Maybe an issue with the version of smartctl, or newer command instruction sets used which older drives don't recognize?

 

I'll be curious to hear the outcome!

_____

 

As an aside, I'd very much look into some additional cooling capacity/airflow for this server; some of the smart output is scary lookin:

SDU:
Lifetime    Min/Max Temperature:      1/57 Celsius
Under/Over Temperature Limit Count:   0/592

SDT:
Lifetime    Min/Max Temperature:      2/57 Celsius
Under/Over Temperature Limit Count:   0/618

SDR:
Lifetime    Min/Max Temperature:      3/52 Celsius
Under/Over Temperature Limit Count:   0/618

SDX:
Lifetime    Min/Max Temperature:      1/57 Celsius
Under/Over Temperature Limit Count:   0/618

 

I know this is all within the manufacturers spec, but that spec is basically the 'warranted' values (how hot can this thing run and we still make money after the costs of RMAs over the 3 year warranty). While these numbers aren't what I'd consider 'terrifying', they're definitely not stellar either. Anecdotally speaking, 50C seems to be the max for spinning drives when it comes to maintaining their longevity/lifespan.

 

(unless it's an HGST drive, which lately, I've not been able to murder with anything short of the fires of Mt. Doom)

Edited by BVD
Link to comment

Surely! I'll try this next time but as of right now I'm just not spinning down the drives. 

I guess I could and try then for next time I need to restart. 

Fortunately I have no data on these Seagate drives yet, so they never need to spin up unless I'm restarting or Parity.

 

So those drive temps are from old Chassis... I have since upgraded to Supermicro 846 and done the Noctua fan mods. 

Now drives dont get much above 40 in the middle of summer. 

Right now they are a cool 22c

Edited by SavellM
  • Like 1
Link to comment

Good deal man! I've been an admirer of SMC's hardware going back to the SAS1 days, they've got a firm handle on chassis design - I've still got a 737TQ in storage that I'd configured to order for work back in like late 2008/early 2009 that still runs like a top. They were just going to throw it out, and as it was the first system I'd ever been tasked with, from PO to deployment, and finally to retirement, it's sentimentally worth more than any other piece of hardware I own.

 

I occasionally think to myself "maybe I should revamp that thing and give it a new life", but can't bring myself to do it... Every piece of hardware in that chassis is still the exact same as it was when I'd unboxed it all that time ago, other than a drive tray a colleague stepped on the second day, one fan that finally bit it back in 2012 or so, and every single HDD lol.

 

So many memories... I don't recall if the 846 is compatible with SMC's SQ PSUs (or whatever they're calling the PSUs they put in their 'Whisper Quiet' chassis now), but they're a sound investment in spousal harmony (if you've gotta worry about that kind of thing anyway hehehe)

_____

Just realized I didn't comment the second command parameters, that 'hddparm -w' - that's just the 'wake' command being sent to the drive. If that works, but the built in wake doesn't, then at least we'll know what the problem is!

Link to comment

Yea I'm fortunate that my server mess is in the loft. Can be as loud as it wants. 

Also part of the reason why those drives got so warm, in Summer its like a hotbox up there. 

I've done some extractor fan magic and chassis mods to get it under control.

But that said yup, my chassis came with a pair of PWS-920P-SQ :)

 

I think this chassis now with its mods will be my chassis for life. 

I've done all the upgrades, upgraded backplane to the SAS3 variant and did the Noctua fan mods.

I also used to run a Gigabyte board in it, but just replaced that recently to a SM board, just for unity lol.

 

Thanks for the help with the drives, I will test in the next few weeks and let you know.

 

Link to comment
  • 2 weeks later...

Just happened again, when I needed to restart. 

I forgot to do the steps above as this time it showed it as spun up but drive had no temp.

I assumed it was just missing one this one drive and then I restarted and boom it is disabled.

 

This is very annoying and I could really do with getting a fix. 

Next week I'll try replicating with the step above.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.