doron

Community Developer
  • Posts

    635
  • Joined

  • Last visited

  • Days Won

    2

Everything posted by doron

  1. @Elmojo @Toibs finding who issues i/o activity against your drives can be tricky at times. If you eliminated network drive activity against your shares, try to stop all plugins, Dockers and VM's and see what happens. Then, add them back one by one. You might be able to find the culprit this way. Sent from my tracking device using Tapatalk
  2. @_cjd_some stress is certainly understandable in such situation 😉 At any rate you may have bumped unto an Unraid bug (or feature). It might be good to have the script recommend a server reboot once done.
  3. Thanks @_cjd_ for reporting this. I'm not sure I understand what actually went wrong? Why did the stress happen in the 1st place? Sent from my tracking device using Tapatalk
  4. @Elmojo right, yet it does sometimes take it around two minutes until it spins back up. So, probably activity. Sent from my tracking device using Tapatalk
  5. @Elmojo, from looking at your data, it does seem that your internal ZFS pool (the one we're discussing) is in fact being accessed a short while after being spun down. You can see that the "SMART read" messages (a great indication of a drive being spun up) show up about 1-2 minutes after the respective spin down messages. This, in most cases, indicates some sort of disk activity. The question whether the drive in fact spins down can be answered as follows: 1. Open both a UI and a CLI terminal window, have both ready 2. Push the UI button to spin the drive down, after noting its dev name 3. Immediately thereafter (don't wait too long), issue, on the CLI terminal, this command: sdparm -C sense /dev/sdX replace sdX with the actual device name. If the drive is spun down, you will see the word "standby" somewhere on the sense data.
  6. Thanks @Elmojo . One further question: which are the zfs devices? Re the SMART thing, this is in fact a *result* of the drive being spun up, rather than the cause, so, no... Sent from my tracking device using Tapatalk
  7. Some quick questions: 1. Under normal operation, are you seeing log messages with "SAS Assist" prefix? 2. When you manually spin down your zfs pool, are you seeing any? Can you post them? 3. Can you post the output of ls -la /dev/disk/by-path (Or, you can just post diagnostics) Sent from my tracking device using Tapatalk
  8. Let's break this into parts: If the re-spin happens within a few minutes (rather than a couple of seconds), then it's almost certain that either (a) someone is issuing i/o against the drive or (b) some plug-in has waken up and is spinning the drive up, either rightly or erroneously (we've seen both - e.g. an old version of the autofan plugin used to do the wrong thing when inquiring about the drives, which would wake up all SAS drives). Have you looked at your plugins? Can you disable them all and try? Then, if the issue goes away, you can add them back one by one. Ah. That is unrelated to the above and seems to indicate an issue with that script. It may be incompatible with your controller, other firmware or kernel version. I'll look into that later but the actual plugin code is unrelated (and does not use this method to detect SAS drives - proof is: You do get the "SAS Assist" messages, meaning that the plugin has indeed acted on your SAS drives). Ah, Dell servers. Gotta love them (I do, actually), but they have so many firmware layers that you can sometimes lose track of who did what... I do hope it is not some firmware code that causes the drives to spin up. I wouldn't bother with the IT mode thing; you do have it in HBA mode which is what we need. We've seen issues with Seagate drives not spinning down. However, having them spin up after a few minutes is not one of them.
  9. And, per the subject of this thread, can you check whether there's power in pin 3? Not being familiar with this enclosure, - could it be the culprit? (if the drives are SATA and the backplane is dual SAS/SATA, as your backplane seems to be, then essentially there shouldn't be a need to go for a SAS controller.)
  10. Yes, any file, on the location of your choice. Make very sure though: It is accessible to Unraid during (re)starting the array It is kept intact, bit-wise, throughout the life of the array (do not trust a copy/paste of its contents, for example, etc.) You have a good backup copy in a safe place you remember... If you lose it, you lose your entire array and anything else that's encrypted using this keyfile. This all may sound trivial, but - I've seen all of those happen. Better safe.
  11. Sure; just run the tool as you normally would. Once asked for the old (current) password/key, provide it. The tool then tries this key on each available drive. If it can't open any of them, it will shout. If you're asked for the new key, it means the key is good; just hit ^C (ctrl-C) and leave.
  12. @Jetracer , this is unfortunately a commonly reported phenomenon with Seagate SAS drives. The drive/hba combo interprets the spin down instruction somewhat differently than other drives. I don't have a good solution to offer. Sent from my tracking device using Tapatalk
  13. Looks like the SATA drives indeed do spin down. That's the first time I'm seeing this. Probably the doing of the HBA. Does this mean that hdparm -y /dev/sdX does not work for these drives? I could probably add some option to "force" a drive to be considered as SAS even though it isn't. Thing is, it should probably be configured by S/N (to survive restarts). If you'd then make a change - e.g. connect that SATA drive to an on-board SATA port - you'd need to remove this or it will break things. The alternative is to hack the drive_types table in your "go" script 🙂
  14. @suppa-men, the poster you reference appeared to not really have an issue - their SATA drives were spinning back up a few minutes after being spun down, so probably as a result of i/o activity. What you're describing, is something I have not seen or been reported yet. You issue a SAS/SCSI command against a SATA drive and you report that it actually worked. This is very interesting. Let me ask you this - how do you verify that the SATA drive does in fact spin down following the sg_start command?
  15. Oddly, I have never compiled a list of drives that do perform nicely with the spin down/up SCSI/SAS commands. I have collected a list of exclusions - i.e. drives (or drive/controller combos) that misbehave, or otherwise known to either ignore the spin down command or create some sort of breakage upon receiving them. It might be a good idea to compile success stories into such a list. I'll kick it off: I have a few HUH721212AL4200 (12TB HGST) on an on-board supermicro SAS controller (LSI 2308). They spin down and up rather perfectly.
  16. Pushed a proposed patch against this one as well - hope the author finds it useful 🙂
  17. I have added the MG06 SAS drives to the exclusions list, so the plugin will not try to spin them down.
  18. It turns out that the standby (aka "spin down") commands are interpreted differently by different SAS HDDs, unlike SATA drives. These commands are not well standardized. Some drives, after receiving this command, do spin down, but expect an explicit "spin up" command to start revolving again. This behavior is not compatible with Unraid, which expects a spun-down drive to spin back up automatically when the next I/O is directed at it. This translates to read or write errors (depending on the I/O that was underway); if it was a write, you'll get the drive red-x'ed, like you experienced. This has been reported a lot with Seagate drives, and a bit less with Toshiba drives. There's very little the plugin can do about it. I can add the MG06 to the exclusion list - which will mean that the plugin will simply avoid touching these drives. Towards that end can you share the output of: /usr/local/emhttp/plugins/sas-spindown/sas-util
  19. That is correct. I proposed a patch to the author, hopefully he will consider merging it in.
  20. As mentioned above, running Unraid as a VM is not a common occurrence, and isn't even officially supported. Some of us do it, though. Reasons vary. For me, it is the combination of a few reasons. (a) ESXi has been rock solid for me, for years now. It just works - and, in my scenario, it's always on, essentially feeling as part of the hardware. I've also not felt a need to update it (running an ancient 6.x version), so it's really always up - as long as the UPS can carry it. In contrast, I've been updating Unraid occasionally (that's a good thing!), taking it down from time to time for maintenance, and so forth. During those times, my other VMs were up and chewing. (b) Historically, for me, Unraid has mainly served as a (great) NAS, with its unique approach to building the storage array, protecting it etc. Its additional services (a few docker containers) have been an upside, running under Unraid as a convenience. Over the years Unraid as developed immensely and moved from being a storage solution into a one-stop-shop for your home server needs; I've not boarded that train - at least not completely. The shorter version is - I've started with VMWare as a hypervisor, it's been rock solid, my other VMs are not affected by Unraid's dynamic nature (updates etc.), never saw a good reason to flip that structure.
  21. Perhaps this calls for a new feature of this little tool - back up my LUKS headers. I'll take a look to see what it takes to do that reliably for the entire array. @Jclendineng
  22. Yes. When using -S, several drive families then require an explicit spin-up command sent to them to spin up (in sg_start jargon, that'd be "-s" - small s). That is not what Unraid expects - it expects an implicit wake up whenever subsequent i/o is issued against the drive. So while "-S" would cause more drives to spin down, it will also cause some of them to stay spun down, which would translate into Unraid marking the drive as faulty and various other small hells breaking loose. Yes, this is typically the solution to spin-ups closely following spin-downs. Glad you found the issue.
  23. So you believe both LUKS headers got corrupted simulateously? Have you tried a backup copy of the keyfile?