August 11, 20223 yr Good Afternoon – I’ve reached the point where I’ve just decided to ask for help or guidance here. I’ve had an Unraid server for approximately 3 years with little to no issues over that time. At least not anything I couldn’t figure out myself. Approximately 10-12 months ago, my server just started randomly not responding. Essentially just locking up. I would have to power cycle to reboot and bring it back online. Of course, then a Parity check would ensue upon coming back online. The server would never last more than 24-30 hours before locking up again. As I said, I suffered thru this for the last year until about 3 weeks ago when I just couldn’t take it any longer. I decided to completely install Unraid from scratch. I pulled my key off the flash drive, pulled all the data drives out. I took a deep breath and just dove right in. First thing I wanted to do was a MemTest, so I ran 4 passes on the RAM without any failures. BIOS was confirmed up to date. I then started building the array. I began with 2 Parity drives and then would add my data drives one at a time. I did not want to keep the configuration of the old array, and just decided that for me, it would be just as easy to mount an Unassigned drive, copy data to the array, then add that Unassigned drive to the array. Essentially shuffle my data very slowly back in. Is this efficient, probably not, should it work without a problem, yes. But we come full circle back to the Unraid server just not staying “Active” and becoming unresponsive. While copying data or trying the pre-clear a drive, the server just becomes unreachable. The GUI seems to be active at times, but if you try to make any changes or stop the array, it just spins. Attached to the post is my SysLog from a reboot to the lock up. In addition, I’ve captured a picture of the Server monitor show the last portion where it flashes up some errors. Done some basic research and I do have an AMD Ryzen CPU and I have turned C-States on and off, the same results occur no matter the settings. I know this isn’t an Unraid issue per say, and it most certainly is on my end, but I’m at a point where I just want to build a Windows box with JBODs via USB. If anyone has any insight or suggestions, I would be indebted to you. Thank you -Keelhaulers **************************** UNRAID: Version: 6.10.3 CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz Motherboard: Gigabyte Technology Co., Ltd. X570 AORUS MASTER BIOS: American Megatrends International, LLC., Version F36e, BIOS dated: Thursday, May 12, 2022 RAM: 64GB DDR4 GPU: NVIDIA GeForce GTX 1050 Ti syslog-192.168.1.18.log Edited August 11, 20223 yr by Keelhaulers Added Unraid version information. Picture fix
August 11, 20223 yr Author Just happened again after being up for 1 hour. Figured I'd add the latest syslog, just in case it shows something different. -Keelhaulers syslog-192.168.1.18 (Take 2).log
August 11, 20223 yr It looks like a disk controller issue: Aug 11 13:38:19 THOR kernel: mpt3sas 0000:05:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM Aug 11 13:38:19 THOR kernel: r8169 0000:08:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM Aug 11 13:38:21 THOR unassigned.devices: Mounting 'Auto Mount' Remote Shares... Aug 11 13:38:30 THOR kernel: mdcmd (37): nocheck cancel Aug 11 13:38:30 THOR kernel: md: recovery thread: exit status: -4 Aug 11 13:43:14 THOR kernel: smartctl[1871]: segfault at 146a1f38 ip 0000152cab3776ea sp 00007fffd8de7940 error 6 in libc-2.33.so[152cab2b0000+15e000] Aug 11 13:43:14 THOR kernel: Code: 00 00 00 83 f8 0f 0f 84 11 18 00 00 48 8b 4b 70 8d 50 01 48 8b 7c 24 08 48 c1 e0 06 89 93 80 00 00 00 66 0f ef c0 48 8d 14 01 <0f> 11 44 01 08 48 8d 74 01 08 48 c7 42 18 00 00 00 00 f3 0f 6f 3f Aug 11 13:45:46 THOR kernel: smartctl[12254]: segfault at 38 ip 00001475db5e6a1c sp 00007ffdaa770790 error 4 in libc-2.33.so[1475db51c000+15e000] Aug 11 13:45:46 THOR kernel: Code: 83 c7 01 41 83 ff 40 75 d5 83 c3 40 49 83 c4 08 81 fb 00 01 00 00 75 c1 4c 8b 65 00 e9 8d f6 ff ff 0f 1f 44 00 00 49 8b 56 20 <8b> 72 38 48 8b 55 18 89 34 82 4d 8b 6e 08 4d 85 ed 74 14 4d 89 ee smartctl is segfaulting. We will need a disk guru like @JorgeB to take a look. You also have a realtek NIC and that may be causing some issues. Realtek drivers on Linux are troublesome because they are not updated for each release of Linux.
August 12, 20223 yr Community Expert Unraid driver is crashing, this can sometimes be helped by using a different kernel, update to v6.11.0-rc3 to see if it helps.
August 12, 20223 yr Author Well I upgraded to 6.11.0-rc3 and set out to add my next disc into the array and began the clear on it. Only lasted about 1 or 2 hours before the errors and unresponsiveness kicked in again. Attached are my latest diagnostics. Any further suggestions would be appreciated. -Keelhaulers thor-diagnostics-20220812-0952.zip
August 12, 20223 yr Community Expert Smartctl segfaulting is strange, could be a hardware issue, do you remember if the issues started after an Unraid release upgrade? If there's a known working release downgrade back to it, boot it safe mode, if the issues continue it's likely hardware.
August 12, 20223 yr Author Thanks, I was thinking of trying to downgrade. It's been flaky for awhile. I will try a pinpoint a date and release that I remember it working "properly" and then try to install that version. I appreciate all the suggestions. -Keelhaulers
August 12, 20223 yr Author Oldest version to download that I can find is: https://s3.amazonaws.com/dnld.lime-technology.com/next/unRAIDServer-6.9.0-rc1-x86_64.zip Is there anywhere I can find Unraid OS 6.8.3 ? I would like to start there and work my way up. Thanks
August 12, 20223 yr Community Expert 22 hours ago, Keelhaulers said: Approximately 10-12 months ago, my server just started randomly not responding. v6.9.2 is over one year old now but if you want to go back more: https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.8.3-x86_64.zip
August 12, 20223 yr Author thank you - will give anything a shot at this point. I've got it downgraded to 6.9.0-rc1 currently and I'm attempting to clear a disk. Will see what happens from here. -Keelhaulers Edited August 12, 20223 yr by Keelhaulers add info
August 12, 20223 yr Author No go on this either. I do keep noticing that the errors state that there is a time sync issue. So I decided to check on the BIOS, the time is wrong. I reset it, but after about an hour it changes again. I'm going to change out the CMOS battery and see if I can at least get different results. I don't have high hopes though. Edited August 12, 20223 yr by Keelhaulers typos
August 13, 20223 yr Community Expert Time should not change with the server on, even with a bad CMOS battery, board might be going bad.
August 13, 20223 yr Author yep, as you suspected. New CMOS battery had no affect. Still getting Kernel Panic. I've wasted enough time with this and trying to fix things. I am just going to pull the trigger on a new MB, CPU, and memory. Gonna stay away from AMD Ryzen this time and look for a mid range Intel CPU and board. Would like something with at least 3 M.2 slots, the search is on. Any suggestions would be greatly appreciated. Just hosting Plex, I don't do any gaming... Thanks again to everyone who chimed in. -Keelhaulers
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.