Keelhaulers Posted August 11, 2022 Share Posted August 11, 2022 (edited) Good Afternoon – I’ve reached the point where I’ve just decided to ask for help or guidance here. I’ve had an Unraid server for approximately 3 years with little to no issues over that time. At least not anything I couldn’t figure out myself. Approximately 10-12 months ago, my server just started randomly not responding. Essentially just locking up. I would have to power cycle to reboot and bring it back online. Of course, then a Parity check would ensue upon coming back online. The server would never last more than 24-30 hours before locking up again. As I said, I suffered thru this for the last year until about 3 weeks ago when I just couldn’t take it any longer. I decided to completely install Unraid from scratch. I pulled my key off the flash drive, pulled all the data drives out. I took a deep breath and just dove right in. First thing I wanted to do was a MemTest, so I ran 4 passes on the RAM without any failures. BIOS was confirmed up to date. I then started building the array. I began with 2 Parity drives and then would add my data drives one at a time. I did not want to keep the configuration of the old array, and just decided that for me, it would be just as easy to mount an Unassigned drive, copy data to the array, then add that Unassigned drive to the array. Essentially shuffle my data very slowly back in. Is this efficient, probably not, should it work without a problem, yes. But we come full circle back to the Unraid server just not staying “Active” and becoming unresponsive. While copying data or trying the pre-clear a drive, the server just becomes unreachable. The GUI seems to be active at times, but if you try to make any changes or stop the array, it just spins. Attached to the post is my SysLog from a reboot to the lock up. In addition, I’ve captured a picture of the Server monitor show the last portion where it flashes up some errors. Done some basic research and I do have an AMD Ryzen CPU and I have turned C-States on and off, the same results occur no matter the settings. I know this isn’t an Unraid issue per say, and it most certainly is on my end, but I’m at a point where I just want to build a Windows box with JBODs via USB. If anyone has any insight or suggestions, I would be indebted to you. Thank you -Keelhaulers **************************** UNRAID: Version: 6.10.3 CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz Motherboard: Gigabyte Technology Co., Ltd. X570 AORUS MASTER BIOS: American Megatrends International, LLC., Version F36e, BIOS dated: Thursday, May 12, 2022 RAM: 64GB DDR4 GPU: NVIDIA GeForce GTX 1050 Ti syslog-192.168.1.18.log Edited August 11, 2022 by Keelhaulers Added Unraid version information. Picture fix Quote Link to comment
Keelhaulers Posted August 11, 2022 Author Share Posted August 11, 2022 Just happened again after being up for 1 hour. Figured I'd add the latest syslog, just in case it shows something different. -Keelhaulers syslog-192.168.1.18 (Take 2).log Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 attach diagnostics to your NEXT post in this thread Quote Link to comment
Keelhaulers Posted August 11, 2022 Author Share Posted August 11, 2022 Sorry, here is the diagnostics. thor-diagnostics-20220811-1447.zip Quote Link to comment
dlandon Posted August 11, 2022 Share Posted August 11, 2022 It looks like a disk controller issue: Aug 11 13:38:19 THOR kernel: mpt3sas 0000:05:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM Aug 11 13:38:19 THOR kernel: r8169 0000:08:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM Aug 11 13:38:21 THOR unassigned.devices: Mounting 'Auto Mount' Remote Shares... Aug 11 13:38:30 THOR kernel: mdcmd (37): nocheck cancel Aug 11 13:38:30 THOR kernel: md: recovery thread: exit status: -4 Aug 11 13:43:14 THOR kernel: smartctl[1871]: segfault at 146a1f38 ip 0000152cab3776ea sp 00007fffd8de7940 error 6 in libc-2.33.so[152cab2b0000+15e000] Aug 11 13:43:14 THOR kernel: Code: 00 00 00 83 f8 0f 0f 84 11 18 00 00 48 8b 4b 70 8d 50 01 48 8b 7c 24 08 48 c1 e0 06 89 93 80 00 00 00 66 0f ef c0 48 8d 14 01 <0f> 11 44 01 08 48 8d 74 01 08 48 c7 42 18 00 00 00 00 f3 0f 6f 3f Aug 11 13:45:46 THOR kernel: smartctl[12254]: segfault at 38 ip 00001475db5e6a1c sp 00007ffdaa770790 error 4 in libc-2.33.so[1475db51c000+15e000] Aug 11 13:45:46 THOR kernel: Code: 83 c7 01 41 83 ff 40 75 d5 83 c3 40 49 83 c4 08 81 fb 00 01 00 00 75 c1 4c 8b 65 00 e9 8d f6 ff ff 0f 1f 44 00 00 49 8b 56 20 <8b> 72 38 48 8b 55 18 89 34 82 4d 8b 6e 08 4d 85 ed 74 14 4d 89 ee smartctl is segfaulting. We will need a disk guru like @JorgeB to take a look. You also have a realtek NIC and that may be causing some issues. Realtek drivers on Linux are troublesome because they are not updated for each release of Linux. Quote Link to comment
JorgeB Posted August 12, 2022 Share Posted August 12, 2022 Unraid driver is crashing, this can sometimes be helped by using a different kernel, update to v6.11.0-rc3 to see if it helps. Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 thank you, will give it a try this morning and see what happens. Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 Well I upgraded to 6.11.0-rc3 and set out to add my next disc into the array and began the clear on it. Only lasted about 1 or 2 hours before the errors and unresponsiveness kicked in again. Attached are my latest diagnostics. Any further suggestions would be appreciated. -Keelhaulers thor-diagnostics-20220812-0952.zip Quote Link to comment
JorgeB Posted August 12, 2022 Share Posted August 12, 2022 Smartctl segfaulting is strange, could be a hardware issue, do you remember if the issues started after an Unraid release upgrade? If there's a known working release downgrade back to it, boot it safe mode, if the issues continue it's likely hardware. Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 Thanks, I was thinking of trying to downgrade. It's been flaky for awhile. I will try a pinpoint a date and release that I remember it working "properly" and then try to install that version. I appreciate all the suggestions. -Keelhaulers 1 Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 Oldest version to download that I can find is: https://s3.amazonaws.com/dnld.lime-technology.com/next/unRAIDServer-6.9.0-rc1-x86_64.zip Is there anywhere I can find Unraid OS 6.8.3 ? I would like to start there and work my way up. Thanks Quote Link to comment
JorgeB Posted August 12, 2022 Share Posted August 12, 2022 22 hours ago, Keelhaulers said: Approximately 10-12 months ago, my server just started randomly not responding. v6.9.2 is over one year old now but if you want to go back more: https://s3.amazonaws.com/dnld.lime-technology.com/stable/unRAIDServer-6.8.3-x86_64.zip Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 (edited) thank you - will give anything a shot at this point. I've got it downgraded to 6.9.0-rc1 currently and I'm attempting to clear a disk. Will see what happens from here. -Keelhaulers Edited August 12, 2022 by Keelhaulers add info Quote Link to comment
Keelhaulers Posted August 12, 2022 Author Share Posted August 12, 2022 (edited) No go on this either. I do keep noticing that the errors state that there is a time sync issue. So I decided to check on the BIOS, the time is wrong. I reset it, but after about an hour it changes again. I'm going to change out the CMOS battery and see if I can at least get different results. I don't have high hopes though. Edited August 12, 2022 by Keelhaulers typos Quote Link to comment
JorgeB Posted August 13, 2022 Share Posted August 13, 2022 Time should not change with the server on, even with a bad CMOS battery, board might be going bad. Quote Link to comment
Keelhaulers Posted August 13, 2022 Author Share Posted August 13, 2022 yep, as you suspected. New CMOS battery had no affect. Still getting Kernel Panic. I've wasted enough time with this and trying to fix things. I am just going to pull the trigger on a new MB, CPU, and memory. Gonna stay away from AMD Ryzen this time and look for a mid range Intel CPU and board. Would like something with at least 3 M.2 slots, the search is on. Any suggestions would be greatly appreciated. Just hosting Plex, I don't do any gaming... Thanks again to everyone who chimed in. -Keelhaulers Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.