artdepart

Members
  • Posts

    11
  • Joined

  • Last visited

artdepart's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Thanks all. I made a few changes that seem to have improved system stability. Recording them here for anyone else running a similar Dell server: State when I was experiencing drive dropouts: - No UPS - Two redundant power supplies installed, but wall power only supplied to one (I was simply storing the other redundant PSU inside the server) Note that with the above configuration, the server never powered down even when power was visibly (i.e. lights flickering) affected by things like hairdryers or starting another desktop PC on the same breaker. Current state that appears to be stable: - Purchased and installed UPS - Connected both redundant power supplies to UPS I have not had any drive droputs since making these changes. I will report back if the issue resurfaces. Again, thanks.
  2. The disks don't show up in iDRAC, and they never have since I flashed the RAID card to IT mode. They do not show in the BIOS boot menu for the server, and never have. Perhaps I'm not looking in the right place. But they show up in Unraid, of course.
  3. Hello all. With increasing frequency, my array has been going down. Seemingly out of nowhere, all disks start returning read errors which causes all my dockers and VMs to stop working. Typically this is fixed by a reboot, or in some cases after a few reboots. The machine is a r720xd with RAID card flashed to IT mode, so all the array disks are connected via a backplane. Before I start replacing parts, would you folks be so kind as to review the diagnostics and see if there is anything else going on here? I've attached diagnostics from the past two occurrences. This used to occur maybe once a every few months, and now it's nearly every day. Thanks. marvin-diagnostics-20240108-1959.zip marvin-diagnostics-20240106-2330.zip
  4. Thanks Jorge - glad to know it's not something more serious. I'll keep an eye on it.
  5. I've now gone ahead and restarted the server after taking an additional diagnostics download. Everything seems to be normal after rebooting, which is great. I am not sure what caused dockers and VMs to crash and take the cache drive offline. I would really appreciate any eyes on the diagnostics file so that I can make adjustments to avoid this in the future. Thanks so much!
  6. Hello all. Having a similar cache drive issue a few months later. After the issues above, I replaced the SAS cache drive with a WD Red SSD. Woke up this morning and discovered none of my dockers or VMs were responding, and the SMB share that lives on the cache drive is not available. Array shares are responding fine. I captured diagnostics (attached) and have not yet restarted the server. What should my next move be? EDIT: Also just noticed that the "downloads" share, which is cache-only, is not even showing in the list of shares. marvin-diagnostics-20221120-1147.zip
  7. Tried the drive in several slots on the front backplane, and even moved it to the rear backplane. Still not recognized. I'm going to try booting UBCD to see if I can see the drive. Any other ideas for copying/viewing contents of the drive?
  8. Apologies, thanks for your patience. Diagnostics are attached. Hopefully it is of use after the reboot. In the future I must remember to dump the diagnostics the moment something seems to be out-of-place. Note that since the problem occurred, I've installed a new WD Red 1 TB SSD, anticipating that I will need to replace the SAS cache drive. I haven't added it to the cache pool or array yet, but it's showing up under unassigned devices as it should. marvin-diagnostics-20220827-2330.zip
  9. UPDATE: Had a second cache drive fail a few months later, diagnostics attached to the reply below on 20 NOV 2022. Hello all - I'm running a Dell R720xd with Unraid Version 6.9.2 2021-04-07. Woke up yesterday to find that my VMs were not working (input/output error), and then discovered my dockers were not responding either. The array and cache drive looked fine in the Main tab, no read/write errors and no notifications about SMART health. Did a reboot to see if that would fix it, and my cache drive disappeared from the system entirely. The cache is a 600 GB SAS drive that came with the server (I purchased the server used, no idea how long it had been in service). I've tried re-seating the cache drive, plugging it into different slots on the backplane, multiple reboots, no success. I backup my appdata and VMs regularly, so not really any data loss concerns, it's just a pain in the ass to get it all set up again. I would love input on: What does my diagnostics file indicate the problem was? Is my drive dead? How can I tell if I have no way to read it/run a health check? If yes, why did it fail with zero warning? Any way to recover the entire file structure from the drive? I tried connecting it to my laptop using a string of dongles (usb to SATA, SATA to SAS) but the internet tells me that this will not work. Thanks all! EDIT: The drive bay is blinking green at a steady interval, which is not like any of the other drive bays (the others kind of flicker green). Per Dell's website, blinking 2x per second means "Identifying drive or preparing for removal", but not sure if all of their functionality is preserved with the raid card flashed to IT mode. marvin-syslog-20220825-0125.zip