hades

Members
  • Posts

    47
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

hades's Achievements

Rookie

Rookie (2/14)

0

Reputation

  1. Hopefully this is a small item to add. I run quite a few VMs across 4-5 UnRAID servers, some have ~15VMs running on them. It would be nice to quickly see how much RAM is allocated, and whether I can start up another VM. Right now I need to go to the Dashboard, scroll down, figure out how much is 30% of 128GB, and whether my new VM fits into that. Just a simple number "RAM: 78/128GB" or something either at the top beside "HD % free" indicator, or at the status-bar on the bottom of the website would be awesome. Thank you!
  2. Also, the logs are now full of these types of errors. This is server4: Is this related? EDIT 20 minutes later: Awesome... The entire server died. Had to do a hard-reboot. This is the longest, most frustrating, and most expensive, experience with UnRAID/Linux/whatever...
  3. Good catch on Server4. Thank you. I went through all 4 servers, set the RAM speed to 2666 (disabled the XMProfiles), and made sure that the Power Limit something is set to "Typical". I then ran scrub, which identified a few files as corrupted. Those files have been deleted. Just focusing on Server4 for now. I ran scrub a few times yesterday and no errors. It ran fine for 1.5 days, and is now showing the same types of errors again. The tutorial says to run RAM at 2666, check. Power Limit something = Typical, check. VMs which so far seem to have been unaffected are now showing up as corrupted. Is this a Ryzen thing ONLY? If so, I am good to go and buy an Intel-based MB & CPU as replacement. It's going to come out cheaper than the productivity I and the team are losing out on due to issues like this... Thank you. Server4-diagnostics-20220322-1517.zip
  4. Thank you for the quick response. I already did this (from the FAQ): "find "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar)." I just checked the server I'm concentrating on fixing, it's running at 2666Mhz, which is what the RAM is designed for. Should I slow the RAM down? I'm already getting these errors 12 minutes after bootup. Once these errors occur, would they go away if the underlying problem (let's say RAM out-of-spec) is fixed? Or would the pool need to be formatted for the errors to go away? Thank you!
  5. Hello Everyone! I'm struggling with some data corruption happening on multiple servers (glorified desktops). In total I have 8 UnRAID installations. 2 older ones which have been running fine for years, with multiple disks and VMs, no problems. This post is not about these two. More recently however, I've bought 6 almost-identical Ryzen-based servers (it's a business environment). All of them are: - Ryzen CPU, 12 or 16 core - 128GIG RAM - 3x 6TB drives for storage (xfs) in the array, one parity - 2x 2TB NVMe drives, Samsung 970EVO Plus (in a cache-pool as BTRFS) - weak GPU used for computer booting only All of them contain data on the arrays, and run ~10 VMs (mainly Windows) on which people do remote work. Recently on 4 of these Ryzen machines I've been plagued by BTRFS errors, usually this: Sometimes this results in the filesystem going into read-only mode, effectively taking down the dockers and VMs, forcing me to reboot, after which everything works for a day or so and then this repeats. Two of my Ryzen servers are installed in one location, they're running perfectly. In another location, I had 1 server running for 6 months without issues, then bought Server #2 due to capacity issues, which ran fine for a few months. Then Server #1 started having issues, and I urgently needed things to work, so I bought Server #3 to move the data/VMs onto it. Pretty quickly Server #3 started experiencing the same thing. Because I was planning on buying Server #4 anyway, I did, just installed it a few days ago, and it's already experiencing this problem. The motherboards and RAM are not completely identical. All NVMe-s were Samsung 970EVO Plus 2TB, and because this error is on the NVMe, I bought different NVMe-s for Server #4, it is using WD 2TB NVMe-s, but it's experiencing the same problem. Given the variation in hardware, and the chances of a component failing (which is somewhat rare) I highly doubt this is a hardware problem. But I'm now at a loss as to what is happening. It's eating up my hours and days, trying to stabilize things, make sure employees are able to do their jobs. Everyone works remotely, they need this to work on the data and execute long-running jobs. I'm in the process of clearing out one of the servers, so I can remove it from the network and do some tests on it without any data/VMs. Logs for all 4 servers are attached. Any help would be appreciated! Thank you. Server4-diagnostics-20220319-1738.zip Server3-diagnostics-20220319-1738.zip Server2-diagnostics-20220319-1738.zip Server1-diagnostics-20220319-1738.zip
  6. Was doing some ownCloud maintenance (installing an SSL certificate), when I noticed there's an upgrade for it. I installed the upgrade, and now it won't load. I've already rebooted the physical server. The docker starts, but when I try to visit the website, it just tells me "Unable to connect". None of the clients connect either. I can't even find a log file for this to check what could be wrong. This is my log, which doesn't indicate anything wrong as far as I can tell: To clarify, the SSL certificate installation went great. I had that up and running for a few hours before I noticed and initiated the update to ownCloud. Any suggestions? Thanks.
  7. Hello, I am just experimenting with the VM capabilities of UnRAID 6. I love it, have been waiting for this feature for a while. Now in the process of setting up a new UnRAID configuration because of this. I do have some questions. I set up a VM, and gave it 200GB space, which it promptly allocated to itself, so now there's a 200GB file on the disk. Once I finished with the configuration, I shut the VM down, through Putty I made a backup of the .IMG file onto the array itself, then copied the .IMG file twice (so now I have 1 backup + 3 identical copies). I brought up all 3 VMs, and everything's working fine. This, Putty shows me, is using 800GBs of space. However, when I go into UnRAID's Main tab showing all the array devices, it's only showing 172GB of space used, with 2.83TB free (it's a 3TB disk). Why is this? Also, just to confirm, the IMG files are protected by the array because they are on the array disks (as opposed to cache disks). If the drive fails, I can recover from parity. Correct? Thank you.
  8. Just an update on this. The original parity-check corrected a huge number of errors, all beyond the 3TB limit of the original parity drive. After it finished, I started another parity-check. That completed with 0 errors. I'm assuming all is good. Thank you, hades
  9. It's up to 66,435,555 corrections, and climbing. No errors in the log. Some random other disks spun up, but no activity. Parity "check" is going at 8.5mb/s. Thanks.
  10. I'm not noticing any reads from any of the drives, but a large number of writes to the parity drive. The parity drive seems to be the only one active, and it is SLOW. 8.5MB/s. In the past, as the smaller drives were complete, the parity build/check became faster and faster. I'll leave this running as is, then do another check immediately after. Just do not wish this to blow up in my face. Thanks.
  11. Running latest 6b14. I had a bunch of 3TB drives, and wanted to replace one of the failed 3TB drives with a 4TB drive I had available. Knowing that the 4tb would have to be the parity, I followed the swap-disable procedure, where I pulled the failed drive, left it missing, then reassigned the 3TB parity to be the missing, assigned the 4TB to be the new parity. Then started the array. Parity was copied from 3TB to 4TB. Old 3TB parity was rebuilt fine as the missing 3TB drive. Now I'm running a Parity check, and it went fine up to the 3TB point. After the 3TB point (3TB is my largest data drive), I'm encountering a huge number of parity errors being written to disk. Is there something wrong? My theory is that the 4TB drive contained 3TB parity information, and the remaining 1TB data was garbage, which now it is correcting. The log doesn't show anything wrong. Thank you! syslog.zip
  12. This post came in handy, as I am just going through this process right now. I can't believe how many HDs have failed in the last month. What I didn't expect though was that the array would be offline while the copying is in progress. I guess I understand why, but still was hoping it wouldn't be the case. hades
  13. Thank you for the suggestion, I didn't notice that line in the SNAP setup. I installed it, rebooted just to be safe. The plugin shows up just fine. Same problem in SNAP though. Shows 'NO FS'. For the moment I figured out how not to use the USB drive on unRAID, so I'll be OK not attaching it to unRAID. I'll just prepare the data I need (~3TB of CrashPlan backups) onto a user share, then pull that data across the network onto the USB drive attached to a Windows computer. Will take a few days longer, but I've spent at least that much trying to get this to work.
  14. Thank you Alex. I tried that command: root@winserver:/boot/config/plugins# cache_dirs -w -U 5000 sed: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory sed: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory sed: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory sed: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory ./cache_dirs: xmalloc: make_cmd.c:100: cannot allocate 365 bytes (98304 bytes allocated) It's a 64bit unRAID running on a machine with 16gigs RAM. I tried different -U values with the same result. What denominations are the values in? Bytes? Kilobytes? Thank you.
  15. One partition, covering the entire disk, nothing fancy. Running b14 of unRAID. I will try the NTFS write driver. Thank you.