BeardElk Posted January 24, 2022 Share Posted January 24, 2022 (edited) So my server has been acting up lately, and I don´t know whats wrong. In the logs it shows a bunch of hardware errors, but I still don´t know exactly what is wrong, so if someone please could help me decipher my logs so I can fix whatever is wrong! Sometimes 1 thread gets pinned at 100% utilization, and it wander around the hole cpu changing thread every know and then, htop show something about smdb, only way to get it to stop is by running "fuser -km /mnt/user". Also have a Really hard time just to reboot it. Array is taking 7-10 minutes to stop, and when it boots up again it always runs a parity check, just as it was an "unclean" shutdown, although it wasn´t. Have never had to power cycle by holding down the power button, yet, but it feels like whatever is wrong is just getting worse. Its also showing 70-90% cpu utilization on the dashboard, but only 5-12% on htop, temps agree with htop aswell, near idle temps. Please help! I´m at the mercy of you guys right now, the only option I have left other then you, is doing a clean install of everything, and that would suck majorly. belk-diagnostics-20220124-2251.zip Edited January 24, 2022 by BeardElk dashboard cpu utilization Quote Link to comment
Squid Posted January 24, 2022 Share Posted January 24, 2022 29 minutes ago, BeardElk said: it shows a bunch of hardware errors Just so we're on the same page, what hardware errors are you seeing that you're concerned about? Only thing that really pops out to me on the diagnostics is that your Linux VM refused to shut down... Quote Link to comment
BeardElk Posted January 24, 2022 Author Share Posted January 24, 2022 (edited) 6 minutes ago, Squid said: Just so we're on the same page, what hardware errors are you seeing that you're concerned about? Only thing that really pops out to me on the diagnostics is that your Linux VM refused to shut down... What?! That VM wasn´t even on when I tried to take the array offline. I just skimmed through the log and saw a bunch of hardware error - fatal, and didnt understand what it was. So nothing else is wrong? (The linux VM is an android TV vm, and it does not listen to normal "shut off this vm", I have to force it unless i´m using it, a promt pops up saying are you sure you wanna power off, and it will not power off unless yes is selected.) But why am I getting parity check at every boot? Edited January 24, 2022 by BeardElk Quote Link to comment
Squid Posted January 24, 2022 Share Posted January 24, 2022 lol. Actually, I just noticed it.. My bad I'll get back to you. "deadbeef" rings a bell in the back of my mind and have to think about it. Quote Link to comment
BeardElk Posted January 26, 2022 Author Share Posted January 26, 2022 On 1/25/2022 at 12:52 AM, Squid said: lol. Actually, I just noticed it.. My bad I'll get back to you. "deadbeef" rings a bell in the back of my mind and have to think about it. I don´t understand. Are you being sarcastic?? Quote Link to comment
Squid Posted January 26, 2022 Share Posted January 26, 2022 "deadbeef" was referenced in the syslog dump Jan 24 22:42:48 BelK kernel: [Hardware Error]: 000000d0: deadbeef deadbeef deadbeef 0006c004 ................ It's a "joke" by the programmers In your case if there's nothing in a system event log etc in the BIOS, I'm not sure where to go or what it means. Quote Link to comment
BeardElk Posted January 31, 2022 Author Share Posted January 31, 2022 On 1/26/2022 at 3:30 PM, Squid said: "deadbeef" was referenced in the syslog dump Jan 24 22:42:48 BelK kernel: [Hardware Error]: 000000d0: deadbeef deadbeef deadbeef 0006c004 ................ It's a "joke" by the programmers In your case if there's nothing in a system event log etc in the BIOS, I'm not sure where to go or what it means. I think I´m starting to hone in on whats wrong, kinda user-error and "perfect storm" situation. I´ve got a Duplicati running backups on "important files" from unraid to my synology nas (if I ever have to rebuild the whole, it was running fine when I set it up, but had a bunch of errors in duplicati now, where it was trying to backup appdata while docker was running, resulting in a weird moment 22 where docker processes got killed and tries to restart when at the same time duplicati is there messing with everything. Removed appdata folder from duplicati and running a test now. (edit just finished with 0 errors). So user-error by me for even selecting appdata folder (must´ve had a brainfart) and escalating problems as a result. Quote Link to comment
BeardElk Posted February 1, 2022 Author Share Posted February 1, 2022 Okay that was not the case. Now its pretty much daily that the server just freezes, no connection over lan so safe shutdown is not possible, have to hard reboot to get it working again, and the logs only starts AFTER the reboot, so got no clue what happend before the freeze. Quote Link to comment
ChatNoir Posted February 1, 2022 Share Posted February 1, 2022 You should set up a syslog server and attach the created log for analysis after the next crash. Quote Link to comment
BeardElk Posted February 1, 2022 Author Share Posted February 1, 2022 11 hours ago, ChatNoir said: You should set up a syslog server and attach the created log for analysis after the next crash. Done, i´ll post the syslog when I got it! Quote Link to comment
BeardElk Posted February 8, 2022 Author Share Posted February 8, 2022 (edited) Update, ever since I set up the syslog server, nothing has happened..... I´ve manage to induce an smb lockup, and had to use "fuser -km /mnt/user" to release it, reboot and that was 5 days ago. Been running 24/7 since. Same daily backups to my nas, same weekly backups of appdata, and same daily reboot of my network (routers and switches etc). I´m starting to suspect that its some kind of voltage drop on my 230v mains line. I don´t have an UPS yet (on the top of my to get list), but i´ve had other things freeze at the same time. My synology nas and main router has freezed at the same time as my unraid server just froze and stopped working, but it doesn´t always do that either. Could just as easy be that the unraid server is throwing network errors and crashing stuff. I´ve been in this apartment for 6 years now and never had this problem before, but the recently booted up a major battery factory next door. Edited February 8, 2022 by BeardElk Quote Link to comment
BeardElk Posted February 16, 2022 Author Share Posted February 16, 2022 (edited) I had configured syslog wrong, but its working now. This is the output between the 2 latest crashes: For some reason im unable to upload my syslog. I´m getting "Sorry, an unknown server error occurred when uploading this file. (Error code: -200)" tried changing the name and added .txt and still the same error. Edited February 16, 2022 by BeardElk Quote Link to comment
BeardElk Posted February 17, 2022 Author Share Posted February 17, 2022 No idé why it did´t work, had to copy - past from the running syslog into a new file to get it to upload. Luckily I read the syslog (mirror to flash) from the flash ( i turned off the server then put the flash into another pc and copied syslog, and read it) so i knew where it stopped. So syslog.belk.txt is an exact copy to when it crashed this morning. Per usual it took almost all of my lan connected devices with it, and locked them until I rebooted the server.... syslog.belk.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.