SAS cache drive disappeared - failed with no warning?


Go to solution Solved by JorgeB,

Recommended Posts

UPDATE: Had a second cache drive fail a few months later, diagnostics attached to the reply below on 20 NOV 2022.

 

Hello all - I'm running a Dell R720xd with Unraid Version 6.9.2 2021-04-07.

 

Woke up yesterday to find that my VMs were not working (input/output error), and then discovered my dockers were not responding either. The array and cache drive looked fine in the Main tab, no read/write errors and no notifications about SMART health. Did a reboot to see if that would fix it, and my cache drive disappeared from the system entirely. The cache is a 600 GB SAS drive that came with the server (I purchased the server used, no idea how long it had been in service).


I've tried re-seating the cache drive, plugging it into different slots on the backplane, multiple reboots, no success. I backup my appdata and VMs regularly, so not really any data loss concerns, it's just a pain in the ass to get it all set up again.

 

I would love input on:

  • What does my diagnostics file indicate the problem was?
  • Is my drive dead? How can I tell if I have no way to read it/run a health check?
  • If yes, why did it fail with zero warning?
  • Any way to recover the entire file structure from the drive? I tried connecting it to my laptop using a string of dongles (usb to SATA, SATA to SAS) but the internet tells me that this will not work.

 

Thanks all!

 

EDIT: The drive bay is blinking green at a steady interval, which is not like any of the other drive bays (the others kind of flicker green). Per Dell's website, blinking 2x per second means "Identifying drive or preparing for removal", but not sure if all of their functionality is preserved with the raid card flashed to IT mode.

marvin-syslog-20220825-0125.zip

Edited by artdepart
update to point to latest diagnostics file
Link to comment

Apologies, thanks for your patience. Diagnostics are attached. Hopefully it is of use after the reboot. In the future I must remember to dump the diagnostics the moment something seems to be out-of-place.

 

Note that since the problem occurred, I've installed a new WD Red 1 TB SSD, anticipating that I will need to replace the SAS cache drive. I haven't added it to the cache pool or array yet, but it's showing up under unassigned devices as it should.

 

marvin-diagnostics-20220827-2330.zip

Link to comment
  • 2 months later...

Hello all. Having a similar cache drive issue a few months later. After the issues above, I replaced the SAS cache drive with a WD Red SSD. Woke up this morning and discovered none of my dockers or VMs were responding, and the SMB share that lives on the cache drive is not available. Array shares are responding fine. I captured diagnostics (attached) and have not yet restarted the server. What should my next move be?

 

EDIT: Also just noticed that the "downloads" share, which is cache-only, is not even showing in the list of shares.

marvin-diagnostics-20221120-1147.zip

Edited by artdepart
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.