artdepart Posted August 28, 2022 Share Posted August 28, 2022 (edited) UPDATE: Had a second cache drive fail a few months later, diagnostics attached to the reply below on 20 NOV 2022. Hello all - I'm running a Dell R720xd with Unraid Version 6.9.2 2021-04-07. Woke up yesterday to find that my VMs were not working (input/output error), and then discovered my dockers were not responding either. The array and cache drive looked fine in the Main tab, no read/write errors and no notifications about SMART health. Did a reboot to see if that would fix it, and my cache drive disappeared from the system entirely. The cache is a 600 GB SAS drive that came with the server (I purchased the server used, no idea how long it had been in service). I've tried re-seating the cache drive, plugging it into different slots on the backplane, multiple reboots, no success. I backup my appdata and VMs regularly, so not really any data loss concerns, it's just a pain in the ass to get it all set up again. I would love input on: What does my diagnostics file indicate the problem was? Is my drive dead? How can I tell if I have no way to read it/run a health check? If yes, why did it fail with zero warning? Any way to recover the entire file structure from the drive? I tried connecting it to my laptop using a string of dongles (usb to SATA, SATA to SAS) but the internet tells me that this will not work. Thanks all! EDIT: The drive bay is blinking green at a steady interval, which is not like any of the other drive bays (the others kind of flicker green). Per Dell's website, blinking 2x per second means "Identifying drive or preparing for removal", but not sure if all of their functionality is preserved with the raid card flashed to IT mode. marvin-syslog-20220825-0125.zip Edited November 20, 2022 by artdepart update to point to latest diagnostics file Quote Link to comment
trurl Posted August 28, 2022 Share Posted August 28, 2022 attach diagnostics to your NEXT post in this thread Quote Link to comment
artdepart Posted August 28, 2022 Author Share Posted August 28, 2022 attached marvin-syslog-20220825-0125.zip Quote Link to comment
trurl Posted August 28, 2022 Share Posted August 28, 2022 No, that is just the syslog. The word diagnostics in this post and my previous post and any post where it appears is a link explaining how to get diagnostics. Quote Link to comment
artdepart Posted August 28, 2022 Author Share Posted August 28, 2022 Apologies, thanks for your patience. Diagnostics are attached. Hopefully it is of use after the reboot. In the future I must remember to dump the diagnostics the moment something seems to be out-of-place. Note that since the problem occurred, I've installed a new WD Red 1 TB SSD, anticipating that I will need to replace the SAS cache drive. I haven't added it to the cache pool or array yet, but it's showing up under unassigned devices as it should. marvin-diagnostics-20220827-2330.zip Quote Link to comment
JorgeB Posted August 28, 2022 Share Posted August 28, 2022 Try that device in a different slot, but if it's still not detected it's likely a device problem. Quote Link to comment
artdepart Posted August 28, 2022 Author Share Posted August 28, 2022 Tried the drive in several slots on the front backplane, and even moved it to the rear backplane. Still not recognized. I'm going to try booting UBCD to see if I can see the drive. Any other ideas for copying/viewing contents of the drive? Quote Link to comment
JorgeB Posted August 29, 2022 Share Posted August 29, 2022 If the drive is not detected there's not much chance of copying any data. Quote Link to comment
artdepart Posted November 20, 2022 Author Share Posted November 20, 2022 (edited) Hello all. Having a similar cache drive issue a few months later. After the issues above, I replaced the SAS cache drive with a WD Red SSD. Woke up this morning and discovered none of my dockers or VMs were responding, and the SMB share that lives on the cache drive is not available. Array shares are responding fine. I captured diagnostics (attached) and have not yet restarted the server. What should my next move be? EDIT: Also just noticed that the "downloads" share, which is cache-only, is not even showing in the list of shares. marvin-diagnostics-20221120-1147.zip Edited November 20, 2022 by artdepart Quote Link to comment
artdepart Posted November 20, 2022 Author Share Posted November 20, 2022 I've now gone ahead and restarted the server after taking an additional diagnostics download. Everything seems to be normal after rebooting, which is great. I am not sure what caused dockers and VMs to crash and take the cache drive offline. I would really appreciate any eyes on the diagnostics file so that I can make adjustments to avoid this in the future. Thanks so much! Quote Link to comment
Solution JorgeB Posted November 21, 2022 Solution Share Posted November 21, 2022 Cache device dropped offline, this is usually a power/connection problem. Quote Link to comment
artdepart Posted November 21, 2022 Author Share Posted November 21, 2022 Thanks Jorge - glad to know it's not something more serious. I'll keep an eye on it. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.