Frustration Has Set In - Help Please


Recommended Posts

Good Afternoon – I’ve reached the point where I’ve just decided to ask for help or guidance here.

 

I’ve had an Unraid server for approximately 3 years with little to no issues over that time. At least not anything I couldn’t figure out myself.

 

Approximately 10-12 months ago, my server just started randomly not responding. Essentially just locking up. I would have to power cycle to reboot and bring it back online. Of course, then a Parity check would ensue upon coming back online. The server would never last more than 24-30 hours before locking up again.

 

As I said, I suffered thru this for the last year until about 3 weeks ago when I just couldn’t take it any longer.

 

I decided to completely install Unraid from scratch.  I pulled my key off the flash drive, pulled all the data drives out. I took a deep breath and just dove right in.

 

First thing I wanted to do was a MemTest, so I ran 4 passes on the RAM without any failures. BIOS was confirmed up to date. I then started building the array. I began with 2 Parity drives and then would add my data drives one at a time.

 

I did not want to keep the configuration of the old array, and just decided that for me, it would be just as easy to mount an Unassigned drive, copy data to the array, then add that Unassigned drive to the array. Essentially shuffle my data very slowly back in. Is this efficient, probably not, should it work without a problem, yes.

 

But we come full circle back to the Unraid server just not staying “Active” and becoming unresponsive. While copying data or trying the pre-clear a drive, the server just becomes unreachable. The GUI seems to be active at times, but if you try to make any changes or stop the array, it just spins.

 

Attached to the post is my SysLog from a reboot to the lock up. In addition, I’ve captured a picture of the Server monitor show the last portion where it flashes up some errors.

 

Done some basic research and I do have an AMD Ryzen CPU and I have turned C-States on and off, the same results occur no matter the settings.

 

I know this isn’t an Unraid issue per say, and it most certainly is on my end, but I’m at a point where I just want to build a Windows box with JBODs via USB.

 

If anyone has any insight or suggestions, I would be indebted to you.

 

Thank you

-Keelhaulers

****************************

UNRAID: Version: 6.10.3 

 

CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz

 

Motherboard: Gigabyte Technology Co., Ltd. X570 AORUS MASTER

BIOS: American Megatrends International, LLC., Version F36e, BIOS dated: Thursday, May 12, 2022

 

RAM: 64GB DDR4

GPU: NVIDIA GeForce GTX 1050 Ti           

 

 

 

 

Capture.JPG

syslog-192.168.1.18.log

IMG_1754.jpg

Edited by Keelhaulers
Added Unraid version information. Picture fix
Link to comment

It looks like a disk controller issue:

Aug 11 13:38:19 THOR kernel: mpt3sas 0000:05:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
Aug 11 13:38:19 THOR kernel: r8169 0000:08:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
Aug 11 13:38:21 THOR unassigned.devices: Mounting 'Auto Mount' Remote Shares...
Aug 11 13:38:30 THOR kernel: mdcmd (37): nocheck cancel
Aug 11 13:38:30 THOR kernel: md: recovery thread: exit status: -4
Aug 11 13:43:14 THOR kernel: smartctl[1871]: segfault at 146a1f38 ip 0000152cab3776ea sp 00007fffd8de7940 error 6 in libc-2.33.so[152cab2b0000+15e000]
Aug 11 13:43:14 THOR kernel: Code: 00 00 00 83 f8 0f 0f 84 11 18 00 00 48 8b 4b 70 8d 50 01 48 8b 7c 24 08 48 c1 e0 06 89 93 80 00 00 00 66 0f ef c0 48 8d 14 01 <0f> 11 44 01 08 48 8d 74 01 08 48 c7 42 18 00 00 00 00 f3 0f 6f 3f
Aug 11 13:45:46 THOR kernel: smartctl[12254]: segfault at 38 ip 00001475db5e6a1c sp 00007ffdaa770790 error 4 in libc-2.33.so[1475db51c000+15e000]
Aug 11 13:45:46 THOR kernel: Code: 83 c7 01 41 83 ff 40 75 d5 83 c3 40 49 83 c4 08 81 fb 00 01 00 00 75 c1 4c 8b 65 00 e9 8d f6 ff ff 0f 1f 44 00 00 49 8b 56 20 <8b> 72 38 48 8b 55 18 89 34 82 4d 8b 6e 08 4d 85 ed 74 14 4d 89 ee

smartctl is segfaulting.  We will need a disk guru like @JorgeB to take a look.

 

You also have a realtek NIC and that may be causing some issues.  Realtek drivers on Linux are troublesome because they are not updated for each release of Linux.

Link to comment

No go on this either. I do keep noticing that the errors state that there is a time sync issue.  So I decided to check on the BIOS, the time is wrong. I reset it, but after about an hour it changes again.  I'm going to change out the CMOS battery and see if I can at least get different results.  I don't have high hopes though.

Edited by Keelhaulers
typos
Link to comment

yep, as you suspected. New CMOS battery had no affect. Still getting Kernel Panic.

 

I've wasted enough time with this and trying to fix things. I am just going to pull the trigger on a new MB, CPU, and memory. 

 

Gonna stay away from AMD Ryzen this time and look for a mid range Intel CPU and board.

 

Would like something with at least 3 M.2 slots, the search is on.

 

Any suggestions would be greatly appreciated.  Just hosting Plex, I don't do any gaming...

 

Thanks again to everyone who chimed in.

 

-Keelhaulers

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.