Jump to content

Kyle W

  • Posts

  • Joined

  • Last visited

Everything posted by Kyle W

  1. I ran the machine for about 20 days without errors after removing the single stick I previously isolated. Nemix sent me an RMA and I mailed it in, a little over a week later I had a replacement stick in hand. It has been over 24 hours without any errors so far, so I'm hoping things are resolved.
  2. Well it's been running for over 24 hours with no errors after pulling the second stick. Hopefully I've isolated the bad stick!
  3. I'll try this next. I did download the latest memtest and created a bootable USB.
  4. Well unfortunately the CPU swap did not result in fully resolving the issue. The errors are different now, but not gone. I'm going to reach out to Nemix regarding an RMA on the RAM. Feb 9 22:53:27 Kyle-Server kernel: mce: [Hardware Error]: Machine check events logged Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: Corrected error, no action required. Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: CPU:0 (17:1:1) MC15_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c2040000000011b Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: Error Addr: 0x00000000b3052b60 Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000004c60a400a02 Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. Feb 9 22:53:27 Kyle-Server kernel: EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x1a60a5 offset:0x760 grain:64 syndrome:0x4c6) Feb 9 22:53:27 Kyle-Server kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD kyle-server-diagnostics-20230209-2331.zip
  5. Memtest passed 3 times after running for roughly 24 hours. I am going to swap in my Ryzen 1600X to see if the CPU is causing problems.
  6. Note: tested the typical idle current setting and that did not fix the issue. Running Memtest today.
  7. I noticed last night that my Mac VM was not running after having not touched my NAS all weekend (my BlueBubbles client was not connected on my Android). After logging into the web GUI, there was a notification of an unclean shutdown and parity check in-progress. The power did not go out (I have a UPS which triggers a shutdown within 120 seconds) and asked my family if anyone had touched the NAS, but they hadn't. Looking at the logs, there seems to be a significant number of machine check events with the same IPID and syndromes 0x7e3a00100a800a02 and 0xc3f501000a800a02. I'm assuming there was a crash that resulted in a reboot of the system. Specs: AsRock B450M Pro 4 P5.40 BIOS Ryzen 5700X Nemix 2x16GB DDR4-3200 unbuffered ECC RAM 3x 8tb WD Red Plus 1x 1tb Samsung 980 Pro Cache Drive Nvidia Quadro P4000 (only using for video out so I can modify BIOS settings, etc right now) Here's an example of the MCE, diagnostics also attached: Feb 6 15:06:14 Kyle-Server kernel: mce: [Hardware Error]: Machine check events logged Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: Corrected error, no action required. Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: CPU:0 (19:21:2) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0x9c2041000000011b Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: Error Addr: 0x00000000b5a68b40 Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x7e3a00100a800a02 Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. Feb 6 15:06:14 Kyle-Server kernel: EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x3169a2 offset:0xc40 grain:64 syndrome:0x10) Feb 6 15:06:14 Kyle-Server kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD Any advice for troubleshooting this one? The RAM is running at its specified 3200MHz speed and I've not touched any other settings in the BIOS except for custom fan curves on my chassis fans. I did see a note about setting typical idle current in the BIOS, so I will look into that later today when the parity check is finished. All the listed components are new except for the Quadro. I've been running the RAM since December 31 with no other issues, and the current CPU since January 19 (previously a Ryzen 5700G which I returned due to lack of ECC support, then a Ryzen 1600X for about a week while I was waiting for my 5700X to arrive) also with no noticeable issues but I have not been keeping an eye on the logs. I did actually have a fan controller failure about a week ago and the machine shutdown from what I assumed to be a CPU over-temperature condition. I'm hoping the CPU didn't become damaged from this though it is still within the return window. The RAM would require a manufacturer RMA at this point. Two of the WD Reds hit 47C and 50C which has me a bit freaked out but that's within their operating temperature range. server-diagnostics-20230207-1026.zip
  8. Well it seems that the Unraid GUI added a leading space to my token when copy/pasting. Thanks to the fine folks on the Linuxserver.io Discord for catching that! Also, according to them this thread is no longer being monitored/maintained so check out their Discord for support.
  9. I'm having issues with getting this docker working, all I'm getting are these statuses in the log: Not entirely sure what to do without any response codes. Any debugging steps I should take? Edit: I checked my log terminal and I'm getting the same error as @Luc1691 - curl: (3) URL using bad/illegal format or missing URL. I've tried just the subdomain name without duckdns.org, the entire url, adding http://, etc., and none have worked.
  10. Last night after attempting to trigger a system shutdown and waiting for about 5 minutes, I was still able to access the GUI from my MacBook so I stopped the array and tried again. I waited for another few minutes but ended up just doing a hard shutdown. After that, everything seems to be working fine including my Macinabox VM. Tbh I'm not sure what purpose I have for this plugin at the moment, I threw an old Quadro P4000 into my NAS just for fun. I've only had this system running for about two weeks and I'm pretty new to Unraid in general, though I'm considering getting Plex running and/or hosting some game servers. I don't need to pass through the GPU to my Mac VM since it's just for a BlueBubbles client. It's a Ryzen 5700G, 32GB system with 3 8TB drives in the array and a 1TB NVMe cache along with the aforementioned Quadro P4000. I've attached diagnostics, but to be honest I'm not comfortable installing this plugin again right now. I was not sure how to SSH into the server last night (I will be installing PuTTY on my Windows machines today) but I did check just now and both SSH and Telnet are disabled. server-diagnostics-20221201-0935.zip
  11. Just installed the Nvidia-Driver plugin and I can no longer access the Web GUI - it locked up and now loads indefinitely when trying to access from a new browser tab. Any advice? I can try connecting an old keyboard/mouse to the server and connect to the TV next to my NAS but it's not booted in GUI mode. I'd rather avoid a hard shutdown. Edit: no output from HDMI. I can connect to my shares via SMB still. I tried telnet on my old MacBook and the connection was refused. Have not tried SSH. Edit2: Oddly I was able to connect via web GUI on another device. Not sure why it stopped working on my other computer. I uninstalled the plugin and initiated a reboot that for some reason is taking ages to complete. It just says "System is going down..." and has been counting for like five minutes.
  12. Alright, I'll play around with it some more and see if I can get things back to where they were. I downloaded Dynamix File Manager to make a backup of the old cache on my array and will copy it to the new cache from there as well. Thanks for all of your help! Edit: it worked! All Docker/VMs restored and working!
  13. Update, I added it to a new pool named testrestore and I can see my old data, thank you! Now can I just transfer this to the new drive in the other cache pool to restore my Docker/VMs?
  14. So, shut down the array and add it to a new pool? When both devices are in the pool what happens at that point, will the data be restored to the new device?
  15. Fortunately it doesn't appear to have been wiped: The mount button is no longer grayed out with Unassigned Devices but the partition size says 0B. Typing this from my phone but I'll take a closer look soon.
  16. I tried downloading Unassigned Devices to explore the old cache drive but the mount button is grayed out so I'm not sure what to do.
  17. Dang, I thought I was close! Results of btrfs fi show:
  18. I have a 500gb SATA SSD that was being utilized as a cache drive when I noticed an error stating that TRIM was not enabled. After some digging it sounds like it is because TRIM is not supported by the firmware version of my Dell H310 SAS card. Rather than bother flashing/downgrading the firmware, I ordered and installed a 1tb NVMe drive to replace my cache pool and followed the procedure here: After doing so, it doesn't appear that anything has been transferred from the old cache pool device to the new one. My Docker/VMs are all missing. Any advice on how to transfer these? Attached diagnostics. Currently the old cache drive is unassigned. I have not touched anything yet because I'm pretty new to working with Unraid, only having had this machine up and running for about two weeks. I found the Unassigned Devices plugin and thought that could be helpful for mounting the drive and copying the contents of the old cache, but please correct me if there's a better/more correct way. Thanks! Edit: doing some more digging, looks like my answer could be to use the command: btrfs replace start -f /dev/sdX1 /dev/sdY1 /mnt/cache Sourced from here: Can anyone confirm before I execute? I think at this point I just need to figure out the device IDs (edit: I'm blind). Will I need the UD plugin for that command to work properly? Looking at my devices it looks like I should be running: btrfs replace start -f /dev/sdb /dev/nvme0n1p1 /mnt/cache Are there any other steps I'm missing, like disabling Docker and the VM manager? Should I leave the NVMe in the cache pool while running this command? server-diagnostics-20221127-1852.zip
  • Create New...