[SOLVED] Unassigned drive goes to sleep then missing


Ver7o

Recommended Posts

Hello guys,

 

I run my VMs off an unassigned SSD, which runs great. The problem is when overnight I turn my VMs off and the drive eventually goes to sleep (from green to grey), it also then eventually goes missing (appears as historical devices of something like that) and remains missing until I reboot the server.

 

I was thinking if maybe there was a way to make the disk never "spin down" or if there is something else happening.

 

I attached my diagnostic file, but will it even help now that I have already rebooted?

 

Thank for any answers!

 

 

 

 

tower-diagnostics-20191127-1355.zip

Edited by Ver7o
Link to comment

So this is a fresh diagnostic. I rebooted the system and didn't start a VM for a few minutes. Unassigned disk went to sleep and now its under missing historical devices and cannot be woken up unless I reboot.

 

From what I can see now, there was an i/o error on sdc (the unassigned drive) which caused to drive to unmount.

 

As a curiosity question, which file system would u guys recommend for an unassigned drive running windows, linux and mac VMs?

 

Nov 27 18:21:13 Tower kernel: print_req_error: I/O error, dev sdc, sector 1953662178
Nov 27 18:21:13 Tower kernel: XFS (sdc1): metadata I/O error in "xlog_iodone" at daddr 0x747284a2 len 64 error 5
Nov 27 18:21:13 Tower kernel: XFS (sdc1): xfs_do_force_shutdown(0x2) called from line 1271 of file fs/xfs/xfs_log.c.  Return address = 0000000084c58612
Nov 27 18:21:13 Tower kernel: XFS (sdc1): Log I/O Error Detected.  Shutting down filesystem
Nov 27 18:21:13 Tower kernel: XFS (sdc1): Please umount the filesystem and rectify the problem(s)

tower-diagnostics-20191127-1727.zip

Edited by Ver7o
Link to comment

Device is dropping offline, this looks very suspiciously VM related, like you're using that device for a VM, 26:00.0 is the LSI HBA, are you starting any other VM, i.e., if all VMs are left off the disk still disappears?

 

Nov 27 18:20:43 Tower kernel: vfio_ecap_init: 0000:26:00.0 hiding ecap 0x1e@0x258
Nov 27 18:20:43 Tower kernel: vfio_ecap_init: 0000:26:00.0 hiding ecap 0x19@0x900
Nov 27 18:20:43 Tower kernel: sd 9:0:1:0: attempting device reset! scmd(0000000074fec607)
Nov 27 18:20:43 Tower kernel: sd 9:0:1:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 38 36 18 00 02 f8 00
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: handle(0x0009), sas_address(0x4433221101000000), phy(1)
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: enclosure logical id(0x500605b005492dd0), slot(2)
Nov 27 18:20:43 Tower kernel: sd 9:0:1:0: device reset: FAILED scmd(0000000074fec607)
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: attempting target reset! scmd(0000000074fec607)
Nov 27 18:20:43 Tower kernel: sd 9:0:1:0: [sdc] tag#0 CDB: opcode=0x28 28 00 3a 38 36 18 00 02 f8 00
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: handle(0x0009), sas_address(0x4433221101000000), phy(1)
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: enclosure logical id(0x500605b005492dd0), slot(2)
Nov 27 18:20:43 Tower kernel: scsi target9:0:1: target reset: SUCCESS scmd(0000000074fec607)
Nov 27 18:20:44 Tower kernel: sd 9:0:1:0: Power-on or device reset occurred
Nov 27 18:20:44 Tower kernel: mpt2sas_cm0: attempting host reset! scmd(0000000074fec607)

 

Link to comment

In this particular case I tried to run a Linux VM (with GPU passthrough), before I noticed sdc was already missing, so naturally nothing happened because the vdisk was not found. I have no clue what happened to the disk before that.

 

As far as I know group 26 refers to the GPU.

 26:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070] (rev a1)

 

But yeah, it sure looks like thats the case. If the VMs are off, the disk disappears. Before this diagnostic, I just rebooted unraid and went afk for a while without any VM started.

Edited by Ver7o
Link to comment
2 minutes ago, Ver7o said:

As far as I know group 26 refers to the GPU.

You're right, HBA is 25:00.0, my mistake, there have been some spin related issues with devices connected to LSI HBAs lately, first thing you should try is updating the firmware since it's very old, current one is 20.00.07.00, if that doesn't help and since IIRC spin down settings don't have any effect on unassigned devices you can try connecting that disk to an onboard SATA possible if possible.

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.