wug

Members
  • Posts

    20
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

wug's Achievements

Noob

Noob (1/14)

0

Reputation

  1. It crashes again after upgrading the mobo firmware. I'm officially out of ideas. Interestingly, when it has crashed during these parity checks, according to the hourly array health notifications, it seems to crash in 4-5 hours, but then I'll log in the next day to find that the system has restarted and has had uptime for 12+ hours (with the array offline). I'll note that the last few parity checks were in maintenance mode, so it's unlikely to be a filesystem-related issue. I think this evening, I'll run a live Debian environment on this system instead of Unraid, so that I can make sure all of my disks are backed up to tape. I'll report back on system stability when it's just running Debian, because if it's still crashing, then presumably, it's hardware. If it's not crashing, then we can assume that it's Unraid being incompatible with this particular hardware for some reason.
  2. It crashed again with the mobo reset to default settings. I think one interesting clue is that after I rolled back to 6.11.5, when the server crashes, it actually seems to do a full system reset. I keep opening it up to find that it's crashed but is now waiting ready to go once again. This is in contrast to how it would require a long press on the power button to forcibly shut it down because it was TOTALLY unresponsive. I'm going to try to graphics settings someone else described above, and if it crashes again, I'll try upgrading the BIOS.
  3. OKAY. I got all disks to be available again. I'm now with a mobo that was reset to default settings (and it hasn't been updated away from the version that was stable for a year) and a nearly-fresh install of Unraid 6.11.5. I'll start a new parity check and report back if things have been resolved or if further debugging is needed.
  4. It's crashed three more times since yesterday. Each time, it crashed not too long after starting a parity check. As part of my next debugging step, I reset the motherboard to default settings, and I now have an interesting clue. The system booted up just fine and the parity disk was missing. It remained missing after another reboot. All other disks are present and available: It's not super likely to be a bad cable, because it's a SAS to SATA cable and the other three connected drives are available. I pulled the parity disk out, stuck it in the external USB drive caddy, and it shows up just fine. But now, another disk is missing: Baffling. When I switched the Disk 7 to the SATA connector that parity was connected to, it was still missing. I wonder if some BIOS setting was causing the system to miss that certain disks are having issues... somehow? But that barely makes sense. I'm gonna do some more fiddling with it to see if I can get it behaving.
  5. I rolled back to 6.12.2, and it's already crashed. If I recall correctly, I actually upgraded to 6.12.3 from a lower version (6.12.0 or 6.12.1), so just to be thorough, I've now rolled back all the way to 6.11.5. I have a parity check in progress now, so let's see if I can successfully complete one of these for the first time in almost three months!
  6. I have been having an inexplicable issue where the server hangs and requires a hard reboot to return to normal. This has occurred since upgrading to 6.12.3 (and now 6.12.4). When I say it's random, I mean it: Sometimes the server has lasted 4+ days after a reboot, sometimes it's become unresponsive within minutes of booting. Sometimes it has crashed while disks are active, sometimes when they are idle. It's crashed with all of these combinations: Docker disabled, VMs disabled Docker enabled, VMs disabled Docker disabled, VMs enabled Docker enabled, VMs enabled It's crashed when it's connected to the usual network, and when it's on its own dedicated subnet. Crashed for each ethernet port on the motherboard being used as the sole network connect (and that's a 1G and a 2.5G port, so they aren't even the same hardware or drivers!) It's crashed with the configuration that I've built up over the last eight years of running Unraid, and it's crashed with a fresh configuration on a fresh flash drive. There is not a single condition that is actually correlated with crashes. I believe this is related to these issues, but I'm creating a new post because one was marked as closed, and I also went to some pretty significant lengths to try to debug this: Debugging Process This has been going on since August, and I've done absolutely everything possible to eliminate defective hardware as a possibility. That includes: swapping out all PCIe cards with spares a run of the system with each indivdual drive disconnected, one at a time (i.e. I remove one disk, see if it still crashes, if it does, put the disk back in and pull the next one). every single non-destructive stress test I can think of fsck each disk and pool individually run every maintenance operation I can think of tested various configurations of power and sleep settings on the motherboard BIOS Logging Process Here's the wildly frustrating part: I creating a syslog configuration that would log basically every single message it could (including marks) to a log file on a ext4-formatted flash drive mounted as an unassigned device. I had at least a dozen log files that didn't contain a single error and before those log files end, there is an unbroken sequences of --MARK-- lines that goes back for hours before the system locked up. I've also tried using various notification methods to try to receive messages that the system is dying, and I've also tried setting up remote logging. None of them ever surface an issue anywhere near the time of the crash, so the crash is definitely also killing outbound networking. There is one tiny hint of what might be going on, and that is for a period of time, when I rebooted after a crash, I would get a "udma crc error count returned to normal value" for a drive (but it never seemed to be consistent). However, all the components have since been removed and added back to the server and I haven't seen an issue like that in a while. I'll also add that rebooting requires holding down the power button on the computer until it shuts off. If I just do a quick press once, nothing happens: the server keeps running, the monitor doesn't wake up, nothing happens to indicate that anything was actually able to capture that ACPI signal. Fresh Install Last night, as my last step, I used the USB creator tool to create a brand new boot disk with 6.12.4 (on a factory-sealed flash drive), copied over only the bare minimum configuration files (like the array config). Again, it crashed. Unraid 6.12.4 is Fundamentally Broken on Some Systems? I'm just rolling back to 6.12.2 at this point, because there is nothing abnormal in the diagnostics or logs that would indicate an actual problem. I've attached the diagnostics file from the fresh install from before the array was even started, because this locking up problem happens even when nothing is mounted and the system is just idling. But it also happens when a parity check is running, so it's not just a high-load or low-load issue. tl;dr: There is some issue occurring with Unraid ≥6.12.3 that cannot be detected through any normal logging methods, and has made my local installation totally unusable since August—and I'm apparently not the only one. mediatower-diagnostics-20231027-1338.zip
  7. Does that mean if I have file /mnt/cache/share-a and hardlink from /mnt/cache/share-b, if Mover wants to move the file on B but not A, it will just copy the file? What if shares A and B are both set to move to the array? Would it still make a copy, or would it move them to the same disk, preserving the hardlink? Also, now that there's mover support for hard links, does that mean there's user share support for hard links? And you mention you use symlinks often, do those only work on disk shares, or do they also work on user shares?
  8. I know it's not possible to create a hard link on a user share, so I'm working if there's a viable workaround. Let's say I have a file on /mnt/cache/share-a and I create a hard link to it at /mnt/cache/share-b. In this setup, share-a has a cache preference set to "Prefer" while share-b has cache set to "Yes." When the mover runs, will it move just the link from the cache disk to an array disk, or will it copy the whole file? Alternatively, do symlinks work on user shares? Can I move a file to /mnt/user/share-b and symlink to it from /mnt/user/share-a?
  9. I've been seeing this exact issue as well for a few weeks now. Currently, I'm running 6.3.0-rc6.
  10. Wouldn't that just be Unassigned Devices? Can user shares on the array also be on disks managed through Unassigned Devices? Although I guess it would make more sense add a second folder pointing to, for example, an "optimizedmedia" share in each of my Plex libraries and have optimized versions be saved there. That's kind of what I did when I was running Plex off of my laptop with my media on external hard drives; I had a second set of media folders directly on my laptop where it would store "optimized" original quality copies of shows that were On Deck in Plex.
  11. As you've already noted, the transfer speeds you can get with a modern hard drive with platter densities over 1TB/platter are excellent. There's certainly no compelling reason to use an SSD as a cache for writes to the array, or for the usage you noted vis-à-vis Plex shows. However, as a VM and Docker store, an SSD would have significant performance advantages with the internal usage, NOT because of the much higher write speeds, but because of the dramatically better access times for every disk I/O. Whether or not that's "compelling" depends on your specific usage ... but in most cases I'd tend to think it doesn't really matter. Note that you could also mix the choices -- use a large rotating platter drive together with an SSD, with the VM and Docker stores assigned to the SSD. Can you assign shares to use specific cache disks like you can with shares on the array? If that's the case, then an SSD/HDD pair is definitely the best option. I would like the performance bump I'd get from an SSD, but I'd also like to have a high-capacity off-array store which improves performance in other ways. For example, I've been trying to get as much media as possible in x265 and I've been converting files to x265 because the space savings is huge, but until I upgrade my other internals, it would be nice to also keep optimized versions of things like what's On Deck in Plex. That way I could have tiny x265 files for long-term storage, but I wouldn't have to violently assault my processor every time I want to watch the latest episode of a TV show on my Chromecast.
  12. That's only slightly faster than the sequential read/write speed of the 6TB WD Black! I don't use any VMs that require high performance nor do I plan to, but I would like to use the cache to store things like optimized Plex versions of recent shows, and other things of that nature. My priority is leaning toward capacity over performance, so I'm trying to gauge if there is any very compelling reason to lean one way or another.
  13. I've been looking at drives for my cache pool and I can't decide if I want to buy bigger, high performance HDDs or smaller mid-range SSDs. My thinking is that WD Black drives and other drives on the same tier show write speeds in the ballpark of 150MB/s. Thus, on a Gigabit LAN, I could saturate my connection with 125MB/s of data to the server, and the drive would be able to keep up. I could further increase the maximum write potential for writes generated on the server itself if each cache drive was actually two smallish drives in a hardware RAID-0 configuration. Does anyone have any strong opinions on the matter? Would I be better off with an SSD or two high performance HDDs in RAID0 configuration? For the same amount of money, the SSD option would give a much higher performance while the HDD option would give me much more space.
  14. Do you have the 6.2 beta? The line in the preclear script that triggers the "Device X is busy" calls sfdisk -R, but the version of sfdisk in 6.2 doesn't have the -R option.
  15. In general, I agree with the "don't waste electricity" argument. However, the reason for time-of-use pricing in my area is because there are a ton of different electricity sources. During the off-peak hours (when the disks would be most likely to be spinning without anyone using them) something like 90% of the electricity comes from hydroelectric and nuclear power. The higher usage tiers are when they have to turn on gas and wind power, but during those times, the server is going to be in use anyway. During the off-peak hours, the demand is so low that they are exporting the hydroelectric power to places that are far, far away, and they turn off the wind turbines because they create a surplus of energy that the grid can't handle. And I use so little energy already that the electricity cost portion of my bill (not including the flat distribution fee) has never gone over $11/month. Plus, this is the worst case consumption scenario. 50 Watts 24/7 would be 438kWh per year. The more likely scenario (spinning disks not under load) would be half of that. But you definitely do have a valid argument and I will also consider alongside any other pros/cons that others may mention!