Vyktrii Posted June 21, 2023 Share Posted June 21, 2023 Few days ago i was encountering major crashing issues I managed to temporarily solve it by using a usb eth adapter. but i kind of solved it by buying a new router and changing the USB stick of unraid, these kind of crashes stopped for me but other kind of crashes have started But over few days, sometimes my unraid crashes and its mostly after i do some activity on docker like uninstalling or stopping a container, once it crashes, before crashing few cpu threads get stuck at 100% for around 3-4mins and then it crashes, then my unraid keeps crashing every time array is started, the only workaround is to delete docker image and reinstall all plugins/containers. Yesterday, my unraid crashed after i stopped a docker container and it kept crashing, i updated from 6.11.5 to 6.12 but it still kept crashing, but deleteing docker image fixed it, i have had this issue couple of times where i need to delete the docker image, about troubleshooting hardware (my motherboard has been replaced and its new, my ram passes memtest for 12hours and swapping my ram sticks from my main pc doesnt help with the crashes, also i tested my CPU my running prime 95 for 1 hour and then running cinebench 3 times, i dont know how i can test cpu on unraid so i just transcoded 3 4k streams to 1080p using plex to test CPU [it pushes the cpu to constant 94-98%]), the only thing that fixes this is deleting and reinstalling docker image syslog-192.168.1.10.log kassandra-diagnostics-20230621-1317.zip Quote Link to comment
JorgeB Posted June 21, 2023 Share Posted June 21, 2023 Call traces I see are Nvidia related, does it help if you uninstall the Nvidia driver? Quote Link to comment
Vyktrii Posted June 21, 2023 Author Share Posted June 21, 2023 25 minutes ago, JorgeB said: Call traces I see are Nvidia related, does it help if you uninstall the Nvidia driver? Im unabke to replicate the issue as i fixed it by nuking docker image, but will try uninstalling nvidia driver, i also forgot to mention that when this issue occurs, it sometimes kills even my new router but it sometimes doesnt (last issue consistently killed my router) Also this post has similar symptoms as me, https://forums.unraid.net/topic/135151-docker-unresponsive-unraid-at-100-cpu-eventual-system-crash/ Quote Link to comment
Vyktrii Posted June 21, 2023 Author Share Posted June 21, 2023 7 hours ago, JorgeB said: Call traces I see are Nvidia related, does it help if you uninstall the Nvidia driver? I encountered another crash, this time reinstalling docker did not help much as it crashed again after few minutes, uninstalling nvidia driver also did not help, i also removed the GPU and tested it on another PC by running benchmarks on it for an hour, it did not crash, so im guessing the only thing that i can now replace is CPU itself or could it still be GPU ? Quote Link to comment
JorgeB Posted June 21, 2023 Share Posted June 21, 2023 9 minutes ago, Vyktrii said: so im guessing the only thing that i can now replace is CPU itself or could it still be GPU ? Difficult to say for certain. Quote Link to comment
Vyktrii Posted June 28, 2023 Author Share Posted June 28, 2023 (edited) On 6/21/2023 at 9:16 PM, JorgeB said: Difficult to say for certain. I replaced both the CPU and GPU (also the PSU), i still get crashes within ~10mins starting the array on 6.12.1, however the errors are now different, rolling back to 6.11.5 makes my system much more stable but i still get crashes after few hours(sometimes it crashes as soon as array starts, sometimes it doesnt crash for hours), however it crashes so bad that im unable to find crash details on log files, but its still call trace errors, the crashes are very unpredictable but they occur mostly when docker containers start, (sometimes a specific docker container crashes it, sometimes it doesnt) error.txt syslog-192.168.1.10.log kassandra-diagnostics-20230629-0026.zip Edited June 29, 2023 by Vyktrii Quote Link to comment
JorgeB Posted June 29, 2023 Share Posted June 29, 2023 With all the call traces still looks like a hardware issue to me. Quote Link to comment
Vyktrii Posted June 29, 2023 Author Share Posted June 29, 2023 2 minutes ago, JorgeB said: With all the call traces still looks like a hardware issue to me. i now have a new mobo, new cpu and i also swapped ram sticks from my main rig to unraid rig, its basically new hardware, which other component can result in call traces ? Quote Link to comment
JorgeB Posted June 29, 2023 Share Posted June 29, 2023 Most often RAM, CPU and/or board. Quote Link to comment
Vyktrii Posted June 29, 2023 Author Share Posted June 29, 2023 37 minutes ago, JorgeB said: Most often RAM, CPU and/or board. since i have swapped/replaced everything, is it a possibility that linux is not compatible with my mobo (B760), are their any essential settings in bios that can possibly cause these crashes ?, i have XMP disabled, i did not change any other setting Quote Link to comment
JorgeB Posted June 29, 2023 Share Posted June 29, 2023 It's possible, do you have another PC you could test Unraid on? Quote Link to comment
Vyktrii Posted June 29, 2023 Author Share Posted June 29, 2023 (edited) at this point its basically a new pc lol, since the start i chaged cpu,gpu,mobo,psu and even swapped ram sticks, anyways i found a temporary fix, I copied appdata, nuked the cache drive (precleared it), and rebuilt docker and copied back the appdata. My rig has been stable for quite some hours and only unraid api is giving me errors, lets see how long will it hold up Edited June 29, 2023 by Vyktrii Quote Link to comment
Vyktrii Posted June 30, 2023 Author Share Posted June 30, 2023 21 hours ago, JorgeB said: It's possible, do you have another PC you could test Unraid on? I made some progress, nuking and rebuilding the cache drive solves it but its a temporary workaround, it works fine but when i try shutting down unraid, its unable to unmount it and gives me errors like these /mnt/cache Jun 30 12:21:07 Kassandra root: umount: /mnt/cache: target is busy. Jun 30 12:21:07 Kassandra emhttpd: shcmd (68357): exit status: 32 Jun 30 12:21:07 Kassandra emhttpd: Retry unmounting disk share(s)... Jun 30 12:21:12 Kassandra emhttpd: Unmounting disks... Jun 30 12:21:12 Kassandra emhttpd: shcmd (68358): umount after the next few reboots i start getting call trace errors after some time, i tried 2 ssds, still same issue, also running fuser -v /mnt/cache gives me "USER PID ACCESS COMMAND /mnt/cache: root kernel mount /mnt/cache" this result, i think my crashes have something to do with my cache drive getting corrupted after few reboots, im unable to find ouit reason why my cache drive wont unmount properly, my timout is also 300secs instead of default 90 also unraid api plugin was giving me GPF errors, i uninstalled it, i think it helped a bit, unraid api gave me this error kernel: traps: unraid-api[5954] general protection fault ip:1bf921c sp:7ffcd57f2548 error:0 in unraid-api[91c000+167b000] so as long as i dont reboot or shutdown its working fine lol kassandra-diagnostics-20230630-1232.zip Quote Link to comment
Solution Vyktrii Posted July 5, 2023 Author Solution Share Posted July 5, 2023 Solved it, it was a brand new faulty nvme drive, my new ssd also sometimes doesnt do a clean shutdown but it doesnt get corrupted atleast Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.