joelkolb

Members
  • Posts

    17
  • Joined

  • Last visited

joelkolb's Achievements

Noob

Noob (1/14)

1

Reputation

  1. @JorgeB unfortunately, since I thought you were talking about Memtest, after everything passed I put all 4 DIMMs back in and cleared the pool errors by deleting the corrupt files as you suggested earlier. No errors are detected after scrub and "btrfs dev stats /mnt/cache" returns all zeros. Would running with 2 DIMMs still be a valid test at this point or has the opportunity been lost?
  2. @JorgeB OK. Once I'm in Unraid with 2 DIMMs is there something I should do to test or am I just waiting to see if I have any more problems?
  3. @JorgeB you suggested: I guess the question is what did you mean by "test"? So I did test (with Memtest) just 2 DIMMs and then I tested the other 2 DIMMs and both pairs tested with no errors. So when you say "test" and "running the server" are you talking about running Memtest with 2 DIMMs or running UNRAID with 2 DIMMs?
  4. @JorgeB It took me a few days to get everything situated but it all worked out following the information you provided. Thanks for your help.
  5. @JorgeB I ran several passes of Memtest of 2 DIMMs and got no errors. I swapped those with the other 2 DIMMS, ran several passes of Memtest on those and also got no errors. If it's not the RAM what else should I check?
  6. @JorgeB running Memtest on the first two DIMMs now. What about the uncorrectable errors on the cache pool?
  7. @trurl I ran "btrfs dev stats /mnt/cache" and it came back all zeros. I ran a scrub on the cache pool and it completed with 4 uncorrectable errors. Then I ran "btrfs dev stats /mnt/cache" again and it came back with this: [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 2 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 2 [/dev/nvme1n1p1].generation_errs 0 @JorgeB trurl suggested it was a RAM issue but I ran multiple passes of Memtest and they came back clean. What should I try next?
  8. @trurl here is the ne diagnostic zip. kolbnet-nas1-diagnostics-20210823-1041.zip
  9. @trurl I've completed 4 cycles of Memtest and I'm 50% through the 5th cycle. I've heard that at least 8 cycles are recommended to be sure of anything but I'm going out on a limb and saying I don't think there is a problem with my RAM? Should I continue running Memtest? If the RAM isn't the problem, what should I do next?
  10. Thanks. I'll force shutdown, run memtest and follow up with the results.
  11. @trurl no, it never occurred to me to run memtest. But to do that I would have to shut the server down and it's been stuck trying to unmount the disks for the past 3 hours. Is there any way to get the server to shut down gracefully or should I force it?
  12. I have a Windows 10 VM running on my Unraid server. I was running sevral tasks at the same time from this VM that were writing and reading to and from shares on the array. All of a sudden Windows became unresponsive. I couldn't get task manager to open or even get the OS to shutdown gracefully. Ultimately I force stopped the VM from Unraid. When I tried to start it again I got a message saying 'Execution error' and 'read only file system' referring to the path to the VM's vdisk file (I wish I had copied the exacting wording or grabbed a screenshot). I tried restarting the array and waited several minutes before then trying to restart the server altogether. The whole time the message at the bottom of the browser window has said "Array Stopping • Unmounting disks...". It's been like this for about 2 hours now. I'm inclined to force shutdown the box but I don't want to break anything either. Does anyone have an idea of what's going on and how to fix/recover from this or what I should do next? Diagnostic zip attached. Thanks! kolbnet-nas1-diagnostics-20210821-1730.zip
  13. @JorgeB thanks! Isn't is possible though that in the future the vdisk could actually become full and fill up the physical disk in a situation where TRIM won't help? Why isn't the vdisk allocation being respected and is there a way to lock the allocation size so that it won't grow beyond what is specified?
  14. @JorgeB thanks for the quick response. That post that you linked to looks very helpful. I didn't know about the virtio-scsi controller with discard='unmap'. I did however try to prevent a situation like this from happening by setting the allocation for the virtual disk to be a bit smaller than the physical disk it lives on. What did I do wrong? How was the vdisk still able to grow to fill the physical disk 100%? Should I have set the allocation size smaller? Most importantly, now that the damage is done and in the current state of things I can't get into Windows long enough to install the virtio-scsi driver, what I can I do to fix this? Thanks!
  15. I am running Unraid v6.9.2 running on a Threadripper 1920X. I have a Windows 10 VM configured to pass through an Nvidia GTX 1660 Super and a USB 3.0 add in card. This setup has worked great for about 2 years. Most recently the array and the VM had been up for about 50 days. I went to work yesterday and when I came home the VM wasn't running. When I check the VMs section in Unraid it shows the VM as being paused. The Windows environment in the VM is configured to be always on, not to sleep or hibernate or anything like that. I have no idea what caused it to pause. Now I can't get it to resume. When I try to resume the little status indicator icon spins for a split second and then goes right back to pause. If I force stop the VM and then start it again it appears to start up normally but a few seconds after the I get to the desktop everything freezes up. After that, when I check the status of the VM in Unraid again it shows as paused and will not resume. I don't know what caused this to happen out of the blue. I am attaching a diagnostic zip and a copy of the VMs xml config. Foolishly, I rebooted the server thinking it might help so I hope nothing was lost that can shed light on what's going on. VM_config.xml kolbnet-nas1-diagnostics-20210819-0940.zip