SavellM Posted March 3, 2021 Share Posted March 3, 2021 (edited) Hi guys, I've been having issues on RC and now 6.9. The issue seems to be that drives are not spinning up and this is causing them to go into disabled state if I reboot. As a work around I was asked to spin up before I restart. I just tried to spin up now and this is what I see: WD drives are spinning up just fine. If I click on a Seagate drive I see: Mar 2 11:42:49 Odin kernel: mdcmd (15): import 14 sdu 64 7814026532 0 ST8000VN0022-2EL112_ZA10D0AW Mar 2 11:42:49 Odin kernel: md: import disk14: (sdu) ST8000VN0022-2EL112_ZA10D0AW size: 7814026532 Mar 2 11:42:50 Odin emhttpd: read SMART /dev/sdu Mar 3 06:04:55 Odin emhttpd: spinning down /dev/sdu Mar 3 10:01:15 Odin emhttpd: spinning up /dev/sdu Mar 3 10:01:33 Odin kernel: sd 1:0:19:0: [sdu] tag#3713 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00 If I click on the white dots to spin up I get this: Disk 15 will not spin up at all. Yesterday it was Disk 14, and I had to remove it from the array and then re-add it back which caused a parity sync. I really dont want to have to do this everytime I want to restart. Mar 3 10:01:30 Odin kernel: sd 1:0:20:0: [sdv] tag#3716 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00 Mar 3 10:01:33 Odin kernel: sd 1:0:20:0: [sdv] Synchronizing SCSI cache Mar 3 10:01:33 Odin kernel: sd 1:0:20:0: [sdv] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Mar 3 10:04:33 Odin emhttpd: spinning up /dev/sdv Mar 3 10:04:42 Odin emhttpd: spinning up /dev/sdv Mar 3 10:04:46 Odin emhttpd: spinning up /dev/sdv Yesterday Drive 14 was the one, I ran a full smart test and no issue. Before this in RC I ran multiple full pre-clears and it too showed nothing wrong with the drives. Its ONLY affecting the Seagate drives so far. WD drives have all been totally fine. I'm wondering if there is some upgraded firmware for Seagate, no idea, never flashed a HDD lol. My hardware is a LSI SAS3008 9300-8I Supermicro X10SRi-F Supermicro BPN-SAS3-846EL1 These are my previous posts from RC: Attached are my diags before restore. odin-diagnostics-20210303-1010.zip Edited March 3, 2021 by SavellM Quote Link to comment
JorgeB Posted March 3, 2021 Share Posted March 3, 2021 There appear to be some issues since v6.9-rc with the LSI driver and 8TB Ironwolfs, possible different capacities also, not sure if it's any different with v6.9 final, but likely not. Quote Link to comment
SavellM Posted March 3, 2021 Author Share Posted March 3, 2021 Groovy! Thanks for the update that would make sense. Is this likely to be fixed via software/kernel in the future? For now I'm just leaving my drives to stay spun up and not sleep, hopefully this should be a workable workaround? Quote Link to comment
JorgeB Posted March 3, 2021 Share Posted March 3, 2021 8 minutes ago, SavellM said: Is this likely to be fixed via software/kernel in the future? Should be. 8 minutes ago, SavellM said: hopefully this should be a workable workaround? If you only have issues after spin up it should work. Quote Link to comment
SavellM Posted March 3, 2021 Author Share Posted March 3, 2021 Hopefully it get resolved fast, its most annoying I also moved away from TrueNAS as I didn't need my drives spinning all the time lol, now I'm back where I started. Irony. Quote Link to comment
BVD Posted March 8, 2021 Share Posted March 8, 2021 (edited) Could I have you try the following the next time you see this happen, prior to reseating? First find the host # for your scsi bus host # for the device: root@sisyphus:~# ls -d /sys/class/scsi_host/host* ... lrwxrwxrwx 1 root root 0 Mar 8 06:04 /sys/class/scsi_host/host6 -> ../../devices/pci0000:00/0000:00:11.5/ata6/host6/scsi_host/host6/ lrwxrwxrwx 1 root root 0 Mar 8 06:04 /sys/class/scsi_host/host5 -> ../../devices/pci0000:00/0000:00:11.5/ata5/host5/scsi_host/host5/ ... Then rescan, specifying the host number for the device per the output of the earlier command: echo "- - -" > /sys/class/scsi_host/host5/scan To limit the scan to only a single disk, specify it in place of the '-' marks The above is kind of generic, but specific to UnRAID, you'd replace the '- - -' with the ID you find for the drive in tools->system devices if I'm remembering properly. Technically, there shouldn't be an issue running it directly as: echo "- - -" > /sys/class/ (etc) With any relatively modern hardware... But as a precaution, I'd recommend first stopping the array just in case. There used to be some issues with scanning an already active device, and while they were sorted out ages ago in code/fw, a little caution never hurt anyone. The 'why' of trying the above - ironwolf drives seem to take an eternity (well, in computer terms anyway lol) to wake up from sleep/standby, especially as they age. The host is basically saying 'I give up, been waiting on you to get out of the bathroom for hours' and walking away, but doesn't go back to re-check down the line to see if the command ever actually did complete (unless you tell it to do so, which we're doing here). If that give bupkis, try manually waking up the drive: hddtemp -w /dev/sdu Maybe an issue with the version of smartctl, or newer command instruction sets used which older drives don't recognize? I'll be curious to hear the outcome! _____ As an aside, I'd very much look into some additional cooling capacity/airflow for this server; some of the smart output is scary lookin: SDU: Lifetime Min/Max Temperature: 1/57 Celsius Under/Over Temperature Limit Count: 0/592 SDT: Lifetime Min/Max Temperature: 2/57 Celsius Under/Over Temperature Limit Count: 0/618 SDR: Lifetime Min/Max Temperature: 3/52 Celsius Under/Over Temperature Limit Count: 0/618 SDX: Lifetime Min/Max Temperature: 1/57 Celsius Under/Over Temperature Limit Count: 0/618 I know this is all within the manufacturers spec, but that spec is basically the 'warranted' values (how hot can this thing run and we still make money after the costs of RMAs over the 3 year warranty). While these numbers aren't what I'd consider 'terrifying', they're definitely not stellar either. Anecdotally speaking, 50C seems to be the max for spinning drives when it comes to maintaining their longevity/lifespan. (unless it's an HGST drive, which lately, I've not been able to murder with anything short of the fires of Mt. Doom) Edited March 8, 2021 by BVD Quote Link to comment
SavellM Posted March 8, 2021 Author Share Posted March 8, 2021 (edited) Surely! I'll try this next time but as of right now I'm just not spinning down the drives. I guess I could and try then for next time I need to restart. Fortunately I have no data on these Seagate drives yet, so they never need to spin up unless I'm restarting or Parity. So those drive temps are from old Chassis... I have since upgraded to Supermicro 846 and done the Noctua fan mods. Now drives dont get much above 40 in the middle of summer. Right now they are a cool 22c Edited March 8, 2021 by SavellM 1 Quote Link to comment
BVD Posted March 8, 2021 Share Posted March 8, 2021 Good deal man! I've been an admirer of SMC's hardware going back to the SAS1 days, they've got a firm handle on chassis design - I've still got a 737TQ in storage that I'd configured to order for work back in like late 2008/early 2009 that still runs like a top. They were just going to throw it out, and as it was the first system I'd ever been tasked with, from PO to deployment, and finally to retirement, it's sentimentally worth more than any other piece of hardware I own. I occasionally think to myself "maybe I should revamp that thing and give it a new life", but can't bring myself to do it... Every piece of hardware in that chassis is still the exact same as it was when I'd unboxed it all that time ago, other than a drive tray a colleague stepped on the second day, one fan that finally bit it back in 2012 or so, and every single HDD lol. So many memories... I don't recall if the 846 is compatible with SMC's SQ PSUs (or whatever they're calling the PSUs they put in their 'Whisper Quiet' chassis now), but they're a sound investment in spousal harmony (if you've gotta worry about that kind of thing anyway hehehe) _____ Just realized I didn't comment the second command parameters, that 'hddparm -w' - that's just the 'wake' command being sent to the drive. If that works, but the built in wake doesn't, then at least we'll know what the problem is! Quote Link to comment
SavellM Posted March 8, 2021 Author Share Posted March 8, 2021 Yea I'm fortunate that my server mess is in the loft. Can be as loud as it wants. Also part of the reason why those drives got so warm, in Summer its like a hotbox up there. I've done some extractor fan magic and chassis mods to get it under control. But that said yup, my chassis came with a pair of PWS-920P-SQ I think this chassis now with its mods will be my chassis for life. I've done all the upgrades, upgraded backplane to the SAS3 variant and did the Noctua fan mods. I also used to run a Gigabyte board in it, but just replaced that recently to a SM board, just for unity lol. Thanks for the help with the drives, I will test in the next few weeks and let you know. Quote Link to comment
SavellM Posted March 20, 2021 Author Share Posted March 20, 2021 Just happened again, when I needed to restart. I forgot to do the steps above as this time it showed it as spun up but drive had no temp. I assumed it was just missing one this one drive and then I restarted and boom it is disabled. This is very annoying and I could really do with getting a fix. Next week I'll try replicating with the step above. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.