Hot swap question


Recommended Posts

Regarding hot swap drives, I assume if I stopped the array and lets say, added a new drive by just popping it in an empty slot, that there is no way for unRAID to know about that drive until a reboot? Likewise, if I wanted to replace a drive with a larger one, or even replace a failed drive, reboots would be in order for any of these scenarios?

Link to comment

The whole idea of a hot-swappable drive is that you can indeed change them while the system is on (i.e. "hot").  I have NOT tried this on UnRAID, but as long as the controllers are operating in AHCI mode, they SHOULD support hot swap -- so if, for example, you added a drive I'd expect UnRAID to "see" it without requiring a reboot.  You would, I presume, have to stop the array to assign it -- but I believe it would work fine.

 

How a replacement would work is another story -- I do NOT know what would happen if you hot-swapped a drive while the array was actually started.  (and I'm not inclined to actually DO that to find out !!).  You should, however, be able to swap them while the array is stopped.

 

Having said both of those things, I'm a conservative old guy who still shuts down my systems before making any drive changes.

 

Link to comment

I have added, pre-cleared and assigned drives without rebooting the server. I just had to stop the array when I assigned each drive then start it again. Preclear and the unassigned devices plugin will work without stopping the array.

 

When upgrading a drive, I would first hot-plug the new drive into a spare slot to preclear it. Then, I would stop the array, pull the drives involved and put the new drive into the slot for the old drive before starting the array again. If I was selling the old drive, after a week or two I'd plug it into a spare slot again to clear it before selling it.

 

I've also upgraded the cache with VM's and Dockers turned off and then plugged the old cache into a spare slot to copy the application data to the new cache before turning the VM's and Dockers back on. I then unmounted the old cache drive and pulled it without stopping the array.

 

You can't actually hot-swap existing array drives with the array started, but having bays and hot-plugging drives is handy.

  • Thanks 1
  • Upvote 1
Link to comment

As I noted earlier, as long as the controllers are operating in AHCI mode, that's how it's supposed to work (hot-swap is fine) -- and as long as the array is stopped, clearly it does from the experiences outlined above.    Nevertheless, I'm "old school" r.e. plugging drives into a running system, so I always shut down to do this.  It's not exactly something that I do with enough regularity that it really matters  :)

Link to comment

It should be noted that hot adding and hot swapping are very different. Hot swapping requires much more than AHCI. unRAID does not support hot swapping, or even hot adding drives to the array.

 

On the Intel chipsets, under AHCI, the hot swap feature must be enabled via BIOS and software.

 

That being said, I recently hot plugged a drive into a linux machine (which I have done thousands of times ala echo "- - -"), and had it crash. The machine crashed at the time of insertion, not detection. I suspect a physical interaction. Power cycled and never looked back.

 

Given the lack of support by unRAID, and the potential for mishap, best practice would be to avoid hot plugging. At the very least, be aware there is risk, and with unRAID limited upside.

 

Regarding hot swap drives, I assume if I stopped the array and lets say, added a new drive by just popping it in an empty slot, that there is no way for unRAID to know about that drive until a reboot? Likewise, if I wanted to replace a drive with a larger one, or even replace a failed drive, reboots would be in order for any of these scenarios?

unRAID will not know about the drive until you config it. But linux will likely detect the drive after insertion, if not you can use the echo "- - -" and have the hba scan.

Link to comment

just my 2bits here.

Hot swapping really depends on the SATA/SAS chipset involved (not the CPU) and the Linux driver involved.

For anything in the IDE category (you really need to change that) - support is practically zero

For anything in the AHCI category - the driver supports it as per spec, but your hardware and the current state of Schroedinger's cat will determine if it will work. In short, it should work, but Murphy might be paying attention and give you a loop.

For anything else (MPTSAS2/3) - the driver supports it, and again same can of worms with your hardware.

 

My experience on the hardware is that you really should have hotplug bays (as in the manufacturer stated they were for hotplugging, versus ease of access). But SATA/SAS ports are actually hotplug by design so YMMV

 

That said, while the array is stopped, unRAID seems to scan the ports repeatedly for disks and changes. But if the array is started, the manual echo '- - -' to scan the specific controller is needed.

 

Link to comment

I don't use hot swap, I prefer to shutdown the server to do any replacements (also I don't have hot swap bays  :P )

 

I have just one hot swap bay on my work unRAID server, it's a AMD X2 with a Nvidia geforce 8200 chipset in AHCI mode, I just tried it and it works but there are some errors in the log, should'n there be a command like the eject option on Windows to do this more cleanly?

 

Disk was unassigned and unmounted, removal:

 

Jan 13 09:52:49 Tower8 kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Jan 13 09:52:49 Tower8 kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Jan 13 09:52:49 Tower8 kernel: ata6: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Jan 13 09:52:49 Tower8 kernel: ata6: hard resetting link
Jan 13 09:52:49 Tower8 kernel: ata6: SATA link down (SStatus 0 SControl 300)
Jan 13 09:52:54 Tower8 kernel: ata6: hard resetting link
Jan 13 09:52:55 Tower8 kernel: ata6: SATA link down (SStatus 0 SControl 300)
Jan 13 09:52:55 Tower8 kernel: ata6: limiting SATA link speed to 1.5 Gbps
Jan 13 09:53:00 Tower8 kernel: ata6: hard resetting link
Jan 13 09:53:00 Tower8 kernel: ata6: SATA link down (SStatus 0 SControl 310)
Jan 13 09:53:00 Tower8 kernel: ata6.00: disabled
Jan 13 09:53:00 Tower8 kernel: ata6: EH complete
Jan 13 09:53:00 Tower8 kernel: ata6.00: detaching (SCSI 6:0:0:0)
Jan 13 09:53:00 Tower8 kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache
Jan 13 09:53:00 Tower8 kernel: sd 6:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
Jan 13 09:53:00 Tower8 kernel: sd 6:0:0:0: [sde] Stopping disk
Jan 13 09:53:00 Tower8 kernel: sd 6:0:0:0: [sde] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00

 

Disk disappears from the unassigned disks.

 

 

Connection:

 

Jan 13 09:53:12 Tower8 kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x5800000 action 0xe frozen
Jan 13 09:53:12 Tower8 kernel: ata6: irq_stat 0x00000040, connection status changed
Jan 13 09:53:12 Tower8 kernel: ata6: SError: { LinkSeq TrStaTrns DevExch }
Jan 13 09:53:12 Tower8 kernel: ata6: hard resetting link
Jan 13 09:53:17 Tower8 kernel: ata6: link is slow to respond, please be patient (ready=0)
Jan 13 09:53:18 Tower8 kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 13 09:53:18 Tower8 kernel: ata6.00: ATA-8: ST3250312CS,             5VT045PH, SC13, max UDMA/133
Jan 13 09:53:18 Tower8 kernel: ata6.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Jan 13 09:53:18 Tower8 kernel: ata6.00: configured for UDMA/133
Jan 13 09:53:18 Tower8 kernel: ata6: EH complete
Jan 13 09:53:18 Tower8 kernel: scsi 6:0:0:0: Direct-Access     ATA      ST3250312CS      SC13 PQ: 0 ANSI: 5
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] 488397168 512-byte logical blocks: (250 GB/233 GiB)
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] Write Protect is off
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] Mode Sense: 00 3a 00 00
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: Attached scsi generic sg4 type 0
Jan 13 09:53:18 Tower8 kernel: sde: sde1
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] Attached SCSI disk

 

Disk is automatically detected by the UD plugin.

 

Link to comment

... should'n there be a command like the eject option on Windows to do this more cleanly?

 

Agree -- and if there is such a command I have no idea what it is in Linux.

 

My Windows workstation is the one system I DO use the hot-swap function ... not for any of the operational drives (system, data, etc.); but to connect or disconnect drives I either want to write data to (e.g. my offline backup drives) or a drive I want to test (using WD's Data Lifeguard).    And in those cases I ALWAYS use the "safely remove" function before disconnecting the drive.

 

Link to comment

I have just one hot swap bay on my work unRAID server, it's a AMD X2 with a Nvidia geforce 8200 chipset in AHCI mode, I just tried it and it works but there are some errors in the log, should'n there be a command like the eject option on Windows to do this more cleanly?

 

Disk was unassigned and unmounted, removal:

 

Jan 13 09:52:49 Tower8 kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Jan 13 09:52:49 Tower8 kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Jan 13 09:53:00 Tower8 kernel: ata6: SATA link down (SStatus 0 SControl 310)
Jan 13 09:53:00 Tower8 kernel: ata6.00: disabled
Jan 13 09:53:00 Tower8 kernel: ata6.00: detaching (SCSI 6:0:0:0)

 

Disk disappears from the unassigned disks.

 

 

Connection:

 

Jan 13 09:53:12 Tower8 kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x5800000 action 0xe frozen
Jan 13 09:53:12 Tower8 kernel: ata6: irq_stat 0x00000040, connection status changed
Jan 13 09:53:18 Tower8 kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 13 09:53:18 Tower8 kernel: ata6.00: ATA-8: ST3250312CS,             5VT045PH, SC13, max UDMA/133
Jan 13 09:53:18 Tower8 kernel: sde: sde1
Jan 13 09:53:18 Tower8 kernel: sd 6:0:0:0: [sde] Attached SCSI disk

 

Those all look typical for a disconnection and a new connection.  I don't think of these as hard errors, more as soft errors, a part of normal operation, detecting the loss of a device and the discovery of a new device.  When the controller detects any physical change of readiness of a device on a port, it raises an exception, and the exception handler then initiates the normal procedures for detaching or initializing the device.

Link to comment

Regarding hot swap drives, I assume if I stopped the array and lets say, added a new drive by just popping it in an empty slot, that there is no way for unRAID to know about that drive until a reboot? Likewise, if I wanted to replace a drive with a larger one, or even replace a failed drive, reboots would be in order for any of these scenarios?

 

Hot-swap works with unRAID - I was able to pop new drives into the free slots in unRAID and it was detected by the system. This worked fine with my Intel H61M SATA Ports, AsMedia ASM-1061 onboard controller, and a HighPoint RocketRaid 2720SGL (Marvel 88SE9840).

 

If using SATA ports, you have to set them to ACHI mode in your bios for hot-swap to work properly.

 

As for pulling assigned devices and hot-swapping them, I don't think that is supported by unRAID.

Link to comment

OK, just wondered if there isn't a similar command to windows "safely remove" but since the disk is unmounted the file system is safe.

 

That's something I'd like to know too.

 

I suppose since this is Linux, you are expected to unmount the file system yourself, before detaching.  And probably Linux GUI's (KDE, etc) provide a button or function that will execute the appropriate unmounts, comparable to 'safely remove'.

Link to comment

OK, just wondered if there isn't a similar command to windows "safely remove" but since the disk is unmounted the file system is safe.

 

That's something I'd like to know too.

 

I suppose since this is Linux, you are expected to unmount the file system yourself, before detaching.  And probably Linux GUI's (KDE, etc) provide a button or function that will execute the appropriate unmounts, comparable to 'safely remove'.

 

I think stop array is pretty much the equivalent of "safely remove" for assigned devices.

Link to comment

OK, just wondered if there isn't a similar command to windows "safely remove" but since the disk is unmounted the file system is safe.

 

That's something I'd like to know too.

 

I suppose since this is Linux, you are expected to unmount the file system yourself, before detaching.  And probably Linux GUI's (KDE, etc) provide a button or function that will execute the appropriate unmounts, comparable to 'safely remove'.

 

Some googling and found this command:

 

echo 1 > /sys/block/sdX/device/delete

 

You can hear the disk spinning down and the log is cleaner:

Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache
Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Stopping disk
Jan 13 15:29:52 Tower8 kernel: ata6.00: disabled

  • Like 1
Link to comment

OK, just wondered if there isn't a similar command to windows "safely remove" but since the disk is unmounted the file system is safe.

 

That's something I'd like to know too.

 

I suppose since this is Linux, you are expected to unmount the file system yourself, before detaching.  And probably Linux GUI's (KDE, etc) provide a button or function that will execute the appropriate unmounts, comparable to 'safely remove'.

 

Some googling and found this command:

 

echo 1 > /sys/block/sdX/device/delete

 

You can hear the disk spinning down and the log is cleaner:

Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache
Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Stopping disk
Jan 13 15:29:52 Tower8 kernel: ata6.00: disabled

Should you do a
sync

prior to ensure everything is all written just for safety sake?

Link to comment

...  Some googling and found this command:

 

echo 1 > /sys/block/sdX/device/delete

 

You can hear the disk spinning down and the log is cleaner:

Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Synchronizing SCSI cache
Jan 13 15:29:52 Tower8 kernel: sd 6:0:0:0: [sde] Stopping disk
Jan 13 15:29:52 Tower8 kernel: ata6.00: disabled

 

Indeed looks like a "safely remove" equivalent  :)

 

... although I wouldn't want to do it on a disk that was part of a mounted array.      Not sure whether or not Linux would warn you in the case or not -- kind of like Windows will refuse to "safely remove" a drive that's actively in use (e.g. your C: drive).  But it should work fine as long as you Stop the array first.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.