Jump to content

doron

Members
  • Content Count

    285
  • Joined

  • Last visited

  • Days Won

    2

doron last won the day on September 25 2019

doron had the most liked content!

Community Reputation

32 Good

1 Follower

About doron

  • Rank
    Advanced Member

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I took another look at this, and there's a clue that might help move this forward (at least I was not aware of it). It appears as if in SAS/SCSI drive management there are two distinct spindown states: STOP and STANDBY. Both make the drive park its heads and spin down; however, STOP requires an explicit START to have the drive spin back up, whereas STANDBY does what we're used to see in the ATA world - the next I/O at the drive will have it spin up again. Now the sg_start -S command that we've been toying with, issues the STOP command (as documented). This is why we've seen the behavior we've been seeing: Unless we issue the corresponding sg_start -s, the device will remain stopped and I/O against it will fail. Now, there seems to be a way to cause the drive to go to standby by using sdparm. e.g. (roughly): sdparm --flexible --save -p po --set=STANDBY=1 /dev/sdj but when I issue this command, the syslog shows this cryptic message: Tower kernel: sdj: sdj1 and the drive seems to be spun up and ready to go (i.e. dd from it works instantaneously). Not sure which component issues this message and in fact, whether it's related to the fact that the drive does not spin down but the message is issued in correspondence with using the sdparm command. Perhaps @limetech can shed more light on this. BTW issuing this command against a drive when the array was started, caused the array to go into parity check(?!?). There's also an SCT parameter (timeout for the drive itself to do a STANDBY), which we could play with (set to a very short period so it's almost an immediate spindown) but trying it in Unraid, this caused the same result as above.
  2. I do use Unraid NFS shares as ESXi datastores and have bumped into similar issues. Mostly timeout errors that make the share "unavailable". I ended up hacking this little tool, which I keep somewhere under /scratch and run as off of cron every few minutes. Reduced the number of mishaps - albeit not to zero. Basically what it does is look for any NFS mount whose state is other than "available", and for each one deletes and re-mounts it. ESXi 6.5d. #!/bin/sh # # ESXi script: Delete and re-create NFS shares that are in any state except "available" # # 2019-09-10 Created # 2019-20-02 Added "force" hack # if [ "$1" == "force" ] ; then FILTER="cat" else FILTER="grep -v [[:space:]]available$" fi esxcfg-nas -l | $FILTER | sed "s|\(.*\) is \(/.*\) from \([0-9\.]*\).*|NAME='\1' ; SHARE='\2' ; HOST='\3'|" | while read line ; do eval $line echo "Yoyo NFS share \"$NAME\"" esxcfg-nas -d "$NAME" esxcfg-nas -a "$NAME" -o $HOST -s "$SHARE" done
  3. Just got to do some experimenting. Findings -- 1. Concur. When a SAS drives is actually spun down, i/o directed at it won't spin it up - it needs to be explicitly spun up. 2. On my system I see a difference I can't explain between sg_start -S /dev/sdX and sg_start -rS /dev/sdX In the first case, immediately after I issue the stop command, my log shows: Jul 16 22:39:35 Tower kernel: sd 4:0:4:0: [sdk] Spinning up disk... Jul 16 22:39:47 Tower kernel: ............ready Jul 16 22:39:47 Tower kernel: sdk: sdk1 This happens every time, and immediately after the stop command. Obviously, after that, i/o succeeds with no issue (since the drive is spun up - not sure who does that). Conversely, when I issue the second stop command (the one with the -r flag), the drive stays spun down. Then, as already reported, all i/o attempts indeed fail with i/o error, until the start command is issued. Not sure what is causing the difference.
  4. Thanks for taking a shot at this!! I'm away from my server until a bit later so will be able to test again when back; quick question though -- why is the -r flag there? A-la manpage this seems to put the drive in read-only mode - is this what you intended? When I tried this command previously (see first post in this thread), I used -s / -S, without the -r.
  5. and Precisely. To drive SAS drives, you need a SAS controller. This could be on-board (on server boards such as Supermicro's etc.) or a dedicated controller. The connector might be SATA style (e.g. many of SM boards have SAS-capable SATA-style connectors) but the protocol must be SAS. Connected to a SATA controller, a SAS drive will not spin up.
  6. @JimJamUrUnraid, @Golfonauta - indeed, as you can see in this post here, it appears as if the challenge is spinning up rather than down; if we spin the drive down using one of these methods, and Unraid is not made aware of it, the next time it will want to write to that drive it will get a timeout (takes time to wake up...) and will red-x it. Then it needs to be rebuilt. No damage to the drive other than that - it's just that Unraid will think the data is bad (out of sync) and will need to rebuild. In short, we need Limetech (@bonienl?...) to come to the rescue.
  7. Running Unraid under Virtualbox is quite straightforward - except that it can't boot from USB so you need to solve that using mechanisms covered in this forum (e.g. PlopKexec). If you want to later migrate Unraid (with your data) to a standalone box, you'd probably also want to pass your HDDs as raw physical drives rather than use virtual HDDs. This can be done using VBoxManage.exe internalcommands createrawvmdk - you can search for it to see details. What you would probably not want to do is to run VMs under this virtualized Unraid.
  8. Umm, "That Depends". It would find the correct drives only if you created the VM the same way, and then either (a) passed the HDD controller (SAS or otherwise) through to the VM, or (b) created RDMs for each HDD and assigned all of them to the VM. There's plenty of guidance here for doing either. At that point, Unraid will see the drives and (hopefully) put them in their corresponding slots. Since you have ESXi 7.0, have you set it up to boot from the Unraid USB drive? However you still need to see why the system does not fully boot, I'd check that by going to the machine console.
  9. Ah okay 🙂 Your sig says 5.0-rc12a. It is needed, as long as your ESXi is at a version lower than 7.0. Previous ESXi can't boot from USB, so you need to either boot from a CD e.g. PlopKexec or from a vhdd that you set up in a certain way.
  10. I notice that you're running Unraid 5... It's been awhile since I ran those. There could several different intermediate steps for the booting - could be a virtual HDD, could be a CD (PlopKexec). It does seem that your server does not complete a successful boot ("Connection refused" means the web server - emhttp in Unraid 5.0 terms IIRC - is not running). You can get to its console from ESXi and see what's going on (I'd first check whether I have local network connectivity - ping another local box - then if I can ping outside - say, 8.8.8.8).
  11. Are you sure you're getting a DHCP address? Any chance you're seeing an autoconf address (169.254.x.x)? I'm asking because it would indicate a different problem. Also, how are you booting your VM?
  12. This used to work on older versions of Unraid, but not in recent versions.
  13. Yeah, I'm using the G2. Old, can sometimes be had from ebay, but does have a solid GUID.
  14. You're welcome, @gerard6110. Incidentally, a different way to solve the issue as you're framing it is to use an USB Card Reader instead of a USB stick. Some of the card readers have IDs that can be used by Limetech's licensing scheme. If the flash component (an SD card of sorts - usually a microSD) wears out, you just pull it out, install a new one and restore content from backup. ID remains the same so license remains intact. You can find out more about this here and here.
  15. Sorry to be the bearer of bad news, but -- probably not; and in fact, the entire scheme, elegant as it may seem at first glance (which is what drew me in when I saw this thread, before stopping to think about it properly) cannot actually work. The only reason it kind-of-works for you right now is that your BOOTDISK is a USB flash drive, and not a hard drive. Had it been an HDD it would not have completed a full boot process (same as you saw earlier, one "bzoverlay" ago). Longer explanation: Unraid's OS, by design, places the various HDD controller (and many other) drivers as loadable kernel modules (not compiled in). Those modules are packaged separately (bzmodules) and are not part of the initramfs. Result is that during the early stages of the boot process, HDD's are not accessible - until bzmodules is mounted (and udev gets kicked to rediscover devices). Effectively, this means that if Unraid is booted from an HDD, there'll be a bootstrap deadlock, since bzmodules won't be accessible, hence not mountable, hence HDD can never be seen or mounted. Game over, insert coin. I believe the scheme proposed in this thread has never actually worked. The bzoverlay file that allegedly performs the cool trick has been never successfully unpacked by the kernel (cuz it wasn't properly packed). Therefore, boot worked and things were looking nice - but the truth of the matter was that no change actually took place, the original licensed USB flash was mounted r/w as /boot, and the boot drive was just lying there doing very little. Only after I gave you a properly packed bzoverlay, have we started to see it "functioning" - essentially making things break... Solving this properly would require repackaging of the Unraid OS, which, while perfectly doable, would change the scheme from elegant to terribly messy.