UNRA1DUser Posted April 18 Author Share Posted April 18 (edited) I did a "SMART extended self-test" on both M.2. SSDs. 0 Errors. Everything seems to work? I have 2x SK hynix Gold P31 2TB PCIe NVMe Gen3 M.2 2280 interne SSD, bis zu 3500 MB/s, kompakt, Formfaktor SSD - Internes Solid State Drive mit 128-Layer NAND Flash, Festkörper-Laufwerk I also added "pcie_aspm=off" to my config. But I think that wouldn´t fix my freezes? Edited April 18 by UNRA1DUser Quote Link to comment
JorgeB Posted April 18 Share Posted April 18 1 hour ago, UNRA1DUser said: But I think that wouldn´t fix my freezes? Probably not, post a new persistent syslog if it happens again. Quote Link to comment
UNRA1DUser Posted April 18 Author Share Posted April 18 (edited) 1 hour ago, JorgeB said: Probably not, post a new persistent syslog if it happens again. I just bootet into the BIOS and wanted to check some settings (Didn´t changed anything). After 1-2 Reboots the USB drive is not recognized anymore. Maybe that´s something I should analyse. Could it happen that after some Days the USB device is not recognized anymore and thats why Unraid freezes? My USB Device (https://www.amazon.de/gp/product/B07D1KCL2Z/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1) is connected via this Adapter "https://www.amazon.de/gp/product/B08N4LQJJN/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1" directly on the Mainboard. (USB Device is new also checked for Errors.) BIOS USB Settings: Legacy USB Support: Disabled XHCI Hand-off: Disabled USB Mass Storage Driver Support: Enabled Port 60/64 Emulation: Disabled Fast boot: Disabled Link CSM Support: Disabled Windows 10 Features: Other OS Secure Boot: Enabled (Because of EFI?) Power Loading: Auto (Set it now to Disabled to test it) I am booting Unraid with EFI. Does any other BIOS Setting is important? If I move my USB Device with Adapter to another Onboard Port it would work again with the same settings. Edited April 18 by UNRA1DUser Quote Link to comment
itimpi Posted April 18 Share Posted April 18 You want Secure Boot to be disabled. You do not need it just because you are using UEFI boot and Unraid does not support it anyway. I would think you could also enable Legacy USB support and CSM support. This would give you the option of alternatively booting in legacy mode (but does not stop UEFI boot). Not sure if it helps in any other way. Quote Link to comment
UNRA1DUser Posted April 18 Author Share Posted April 18 (edited) 25 minutes ago, itimpi said: You want Secure Boot to be disabled. You do not need it just because you are using UEFI boot and Unraid does not support it anyway. I would think you could also enable Legacy USB support and CSM support. This would give you the option of alternatively booting in legacy mode (but does not stop UEFI boot). Not sure if it helps in any other way. Thanks for you´re answer. I disabled secure Boot and Enabled Legacy USB support. If I enable CSM support I get the following settings: LAN PXE Boot Option ROM: Disabled (I think I don´t need it) Storage Boot Option Control: UEFI (I can choose UEFI, Legacy or Do not launch) Other PCI devices: UEFI (I can choose UEFI, Legacy or Do not launch) Should be both on Legacy ? I also found settings for "Intel Platform Trust Technology (PTT)" should that be also disabled? It is currently Enabled. "Software Guard Extensions (SGX)" is set to Software Controlled. Should I disable it? Is it used? Edited April 18 by UNRA1DUser Quote Link to comment
JorgeB Posted April 18 Share Posted April 18 1 hour ago, UNRA1DUser said: Could it happen that after some Days the USB device is not recognized anymore and thats why Unraid freezes? It's a possibility. Quote Link to comment
UNRA1DUser Posted April 18 Author Share Posted April 18 If I startup the System normally and do a reboot I get this Messages. I also recorded the booting sequence. I think it´s not normal or? I Hope you can download it and Slow down the speed to see the Messages. Quote Link to comment
JorgeB Posted April 18 Share Posted April 18 Flash drive dropped offline, try recreating it or replacing it. Quote Link to comment
UNRA1DUser Posted April 18 Author Share Posted April 18 27 minutes ago, JorgeB said: Flash drive dropped offline, try recreating it or replacing it. Can I recreate it with my old files ? Or should I create it completely new and just move all my docker xml and VMs over ? all also bought a new one today. If re creating is not working. Quote Link to comment
JorgeB Posted April 18 Share Posted April 18 Create a new one and restore the config folder. Quote Link to comment
UNRA1DUser Posted April 19 Author Share Posted April 19 19 hours ago, JorgeB said: Create a new one and restore the config folder. I recreated the USB Stick. I am now able to reboot normally again. But I still have the same boot sequence like in the Video -> Is that the normal boot behavior with "Not automatically fixing this" and this big block of numbers ? Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 24 minutes ago, UNRA1DUser said: Is that the normal boot behavior with "Not automatically fixing this" and this big block of numbers ? It does appear to happen with some flash drives, I see it with one of mine. Quote Link to comment
UNRA1DUser Posted April 20 Author Share Posted April 20 I created the USB-Stick completely new and also didn´t copied any configs. The Parity-Sync is also running since yesterday. But at 00:40 AM the Server gots a Message to terminate? I didn´t shutdown the Server. What happend? I wasn´t on the PC/Server since 09:00 PM And today morning I just have to login again and enter my Encryption Key for the HDDs. Whats going on here? Can somebody check the log and help me, please? tower-syslog-20240420-0730.zip Quote Link to comment
itimpi Posted April 20 Share Posted April 20 Unfortunately the syslog you posted (and the version automatically included when getting diagnostics) is the RAM version that starts afresh every time the system is booted. so we do not know what happened prior to the reboot. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what preceded the reboot. The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. When you say the system got a 'message' to terminate what do you mean? If you mean it started a tidy shutdown is there any chance someone/something (e.g. a cat) could have pressed on the power button to trigger a shutdown? If you simply mean it rebooted itself then this is normally a hardware issue of some kind. Quote Link to comment
UNRA1DUser Posted April 20 Author Share Posted April 20 2 minutes ago, itimpi said: Unfortunately the syslog you posted (and the version automatically included when getting diagnostics) is the RAM version that starts afresh every time the system is booted. so we do not know what happened prior to the reboot. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what preceded the reboot. The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. When you say the system got a 'message' to terminate what do you mean? If you mean it started a tidy shutdown is there any chance someone/something (e.g. a cat) could have pressed on the power button to trigger a shutdown? If you simply mean it rebooted itself then this is normally a hardware issue of some kind. I just activated the Syslog Server to mirror to the Flash drive. I don´t have a cat or any other animal. And nobody was entering that room after 09:00 PM. But what Hardware Issue it is? I checked mostly everything. Checked all drives with a SMART test, CPU Benchmark, Changed PSU to a new one, Memtest over several hours. Changed the USB-Stick, also created it new. In mid of the Parity-Sync the Server decided to reboot. And the Server also rebooted at other states. Not only in mid of the Parity-Sync. I mean those Messages in the Syslog. "Received signal 15; terminating." "BERT: [Hardware Error]: Skipped 1 error records" "kernel reports TIME_ERROR: 0x41: Clock Unsynchronized" -> Often my set time in BIOS is flipping back to something else. ACPI Warning: Quote Link to comment
itimpi Posted April 20 Share Posted April 20 If the sever is simply rebooting rather than shutting itself down then this almost invariably indicates a hardware issue with the commonest being power or thermal type issues. If you get either of these then nothing will show up in the logs. Quote Link to comment
UNRA1DUser Posted April 20 Author Share Posted April 20 3 hours ago, itimpi said: If the sever is simply rebooting rather than shutting itself down then this almost invariably indicates a hardware issue with the commonest being power or thermal type issues. If you get either of these then nothing will show up in the logs. But the interesting thing is, I switched to a new PSU and I had the same issue in unraid normal mode. And I tested a CPU Benchmark in Unraid safe mode without plugins and everything. (I tested also with only plugins on). And everything works without Issues for a long time. But I also found a setting in BIOS its called Power Loading. -> Enables or disables dummy load. When the power supply is at low load, a self-protection will activate causing it to shutdown or fail. If this occurs, please set to Enabled. Auto lets the BIOS automatically configure this setting. Maybe I should try that ? Quote Link to comment
UNRA1DUser Posted May 2 Author Share Posted May 2 Hi again, after round about 10-20 days the Server restarts by itself again. I attached my logs. Maybe someone can see something? tower-syslog-20240502-1640.ziptower-syslog-previous-20240502-1640.zip Quote Link to comment
JorgeB Posted May 2 Share Posted May 2 1 hour ago, UNRA1DUser said: Server restarts by itself again. Server restarting on its own is almost always a hardware problem. Quote Link to comment
UNRA1DUser Posted May 3 Author Share Posted May 3 16 hours ago, JorgeB said: Server restarting on its own is almost always a hardware problem. The interesting thing is what Hardware? Can it be a HDD or M.2 SSD ? Or a connection lost to the USB Stick ? Quote Link to comment
JorgeB Posted May 3 Share Posted May 3 27 minutes ago, UNRA1DUser said: The interesting thing is what Hardware? Can be caused by different components, most likely in my experience would be mostly in this order: RAM, PSU, board or CPU 28 minutes ago, UNRA1DUser said: Can it be a HDD or M.2 SSD ? Or a connection lost to the USB Stick ? Won't say it's not possible but it's unlikely. Quote Link to comment
UNRA1DUser Posted May 20 Author Share Posted May 20 (edited) Does anybody knows something about this Error Message? Seems to be my Second NVME Slot? Edited May 20 by UNRA1DUser Quote Link to comment
JorgeB Posted May 20 Share Posted May 20 Try this first: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 Quote Link to comment
UNRA1DUser Posted May 21 Author Share Posted May 21 On 5/20/2024 at 10:53 AM, JorgeB said: Try this first: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 Thanks. I added it. It looks like this now. Let´s see what happens. kernel /bzimage append initrd=/bzroot intel_pstate=passive pcie_aspm=off Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.