March 8, 20251 yr Backstory. I had been having issues recently of the WebUI locking up and sometimes issues with VMs not booting. This week I ran into two issues that I finally shut down and did some physical reconfiguration. 1) One of my 4 NVME drives was showing as missing. 2) One of my windows VMs was no longer running properly. I popped the chassis out of the rack disconnecting all of the various peripherals, power and network cables. I removed all 4 NVME drives to determine the missing one and the Numbers displayed in UNRAID did not match any numbers shown on the sticks themselves. I guess at 1 and replaced it with a new NVME drive. I reconnected everything and rebooted. The WebUI never came up. I opened up my IPMI and validated that I was presented with a console logon screen. I also confirmed I was able to SSH into the system. I attempted to reboot with no change. I attempted a reboot in SAFE mode and no change. I attempted a GUI SAFE mode and was presented with a blank web page. Suspecting I screwed up something with the NVME drive replacement I reverted to the suspected bad drive. On reboot I had the same issues as before. I ran 3 passes of MEMTEST with no errors On digging a bit, I found that I could get something to happen by issuing the following commands `/etc/rc.d/rc.nginx` - would yield a NGINX 'no file found' type webpage (like when you misconfigure your proxy in SWAG) `/etc/rc.d/rc.php-frm` - would bring up the WebUI login at the normal address. Any attempt to login yielded a HTTP 502 error. With messages like "nginx: 2025/03/07 20:25:09 [error] 10520#10520: *1 auth request unexpected status: 502 while sending to client, client: 192.168.1.89, server: , request: "GET /Main HTTP/1.1", host: "192.168.1.100"" Based on some AI assisted research it seems the culprit may be a missing /etc/rc.d/rc.emhttpd I'm not sure however if that is the issue or if it is a symptom of something else. Attached are diagnostics from when this started and the last one from tonight. footeprint-diagnostics-20250304-0512.zip tower-diagnostics-20250304-1923.zip tower-diagnostics-20250307-2042.zip
March 8, 20251 yr Author 6 hours ago, JorgeB said: Last diags has all array disks missing, is this expected? Yes, it is expected. My array is a Dell MD1200 JBOD. In my last attempt to troubleshoot I removed all excess peripherals including my JBOD
March 8, 20251 yr Community Expert Solution Nothing obvious that I can see that would cause that, you can try redoing the flash drive, backup the current one first and then recreate it using the USB tool and just restore the bare minimum, like the key, super.dat and the pools folder for the assignments, also copy the docker user templates folder (\config\plugins\dockerMan\templates-user), if all works you can then reconfigure the server or try restoring a few config files at a time from the backup to see if you can find the culprit.
March 9, 20251 yr Author On 3/8/2025 at 8:22 AM, JorgeB said: ...restore the bare minimum, like the key, super.dat and the pools folder for the assignments, also copy the docker user templates folder (\config\plugins\dockerMan\templates-user), I did this and had the same issue. I ran through a series of experiments to see if I could determine this issue. I removed and replaced drives, PCIe NVME cards, etc. It turns out that at least one of the issues I was facing is that my `cache_fast.cfg` file was causing issues. If I removed it, I could boot into a WebUI and do some further investigation. I went ahead and reinstalled all drives, swapping out the old, most likely dead drive, for a new one. I then booted up into a WebUI where those 4 drives do not form a cache. Hoping beyond hope to recover the data I found the following I was able to access the data on those 4 drives in this way. Please note that when it says: Quote v6.10-rc1 and newer use: mount -o degraded,rescue=all,ro /dev/sdX1 /temp Replace X with any of the remaining pool devices to mount the whole pool, don't forget the 1 in the end, e.g., /dev/sdf1, if all devices are present and it doesn't mount with the first device you tried use the other(s), filesystem on one of them may be more damaged then the other(s). That the NVME drives work a little differently with respect to the adding of the '1'. Instead I ran an `lsblk` command and noticed a different value for my nvme drives of adding a 'p1'. That did the trick. I was able to access the contents and am in the process of RSYNC'ing them now so at least I can restore them. Thanks JorgeB PS I went to thank the creator of the other post and lo and behold it was JorgeB again. You are just like Roy Kent. Edited March 9, 20251 yr by bambino53
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.