[SOLVED] Unraid crashing upon starting up array


Recommended Posts

  • 2 weeks later...

 

@Squid@trurl@John_M

 

Ok update on my never ending saga.

 

Since my last update I had tried all the configurations that was suggested all didnt remedy my issue. I had pretty much chalked it up to hardware issues so I have since bought a replacement motherboard and memory all exactly the same hardware. I've replaced them at at first boot the issue actually became worse the system would not complete it boot up process after starting up from Unraid boot screen. it would simply run and then restart to BIOS prompt and then UnRAID boot and try to startup only to restart again. So the next thing I did was thinking maybe something wrong with my flashdrive so let me rebuild it brand new do not restore the config file just brand new install. upon starting up it would again do the same thing as mentioned before. So now Im at this point maybe other hardware problems. So I start removing PCI cards that I have had installed. I may have forgot to mentioned but I have other hardwares installed but they all had worked with Unraid without issues before for over a year so I thought nothing of it. Because of new motherboard it may have issues so let see. To give idea what I had attached I have 2 graphics cards (Radeon 7850, GTX1060)  2 USB 3.0 controller cards (Rosewill, another card dont remember the brand). I removed all the cards with the exception of one graphics card ( Radeon 7850). After this the system would boot up and.... I am able to BOOT UP! At this point I safely shut it down and boot it backup couple times to confirm it stable booting. I also when in to restore my config folder and also confirmed that I am able to boot up.

 

So you maybe thinking yay its over. Sadly no, however I have some more details now that didnt appear before. I went about starting up the array and staring only docker. I decided to delete the docker image and rebuild my dockers by template. I let the system run a bit and I saw seg faults in the sys log same as before and the system did lock up to the point I had to hard reboot. Now am still able to start up the system but this time i've disabled docker and kept the array running its currently running for the last 35 mins and counting. However in my notification I get an error that I've not seen before that was very helpful. I get the following error "Fix Common Problems: Error: Machine Check Events detected on your server". I got Fix Common Problems and the message now tells me the following. "our server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged". The system is still running but for how long not sure I will keep monitoring it.

 

So I've posted my diagnostics with this update in hopes you can help me. I hope the mcelog is present that I did it properly. 

 

Thank you all for any help you can provide.

supergrid-diagnostics-20210411-1940.zip

Link to comment

Apart from a bit of a wobble during the initilisation of the CPU cores, which seems to be fairly common on both AMD and Intel platforms these days, all seems well until

 

Apr 11 19:23:51 SuperGrid kernel: python3[7232]: segfault at e40c5384 ip 000014dd57fa4bd4 sp 000014dd50c21610 error 6 in sabyenc3.cpython-39-x86_64-linux-gnu.so[14dd57fa4000+1000]
Apr 11 19:23:51 SuperGrid kernel: Code: 85 ed 75 8a 80 fa 3d 74 4d 80 fa 0d 74 a0 80 fa 0a 0f 84 d7 01 00 00 80 fa 2e 0f 84 a6 01 00 00 83 ea 2a 45 31 e4 49 83 c1 01 <41> 88 51 ff e9 69 ff ff ff 0f 1f 00 66 0f ef c0 f2 0f 2a 44 24 04
Apr 11 19:23:54 SuperGrid kernel: python3[9875]: segfault at 14cc05fed5f8 ip 000014cd297baff7 sp 000014cc05fed600 error 6 in libcrypto.so.1.1[14cd2964c000+1a5000]
Apr 11 19:23:54 SuperGrid kernel: Code: a7 2e 0f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 07 b9 ff ff ff 7f 29 c1 39 f1 0f 8c ff 00 00 00 55 01 c6 48 89 fd <53> bb 04 00 00 00 48 83 ec 08 48 8b 7f 08 83 fe 04 0f 4d de 48 85

 

You said you have new memory. Did you test it?

 

You have a plugin called

 

statistics.sender.plg - 2017.09.22

 

I don't know what that does but it isn't essential to Unraid operation and hasn't been updated for a long time so I'd get rid of it. I'd also remove the Sleep plugin for the time being.

 

Link to comment

You have change main hardware and rule out other component still not fixing.

I would suggest you try new UEFI memtest86 ( I just try it yesterday ), some test will use all CPU core ( you need enable it ) which I doubt legacy memtest won't use all core ( may be wrong and I don't trust legacy memtest )

Anyway,  it could be CPU issue too.

 

https://www.memtest86.com/download.htm

 

image.png.b82089c2d1d2ed39cad5c599fdbe3f71.png

 

image.png.2e147ad1eff45d0317719807ce88df00.png

Edited by Vr2Io
Link to comment
Posted (edited)

@Vr2Io hi Thank you for providing this bit of Memtest. I went about to running this mem test and I found many errors. I didnt see this error with unraid memtest running by default. I guess if im seeing errors running the multicore memtest but not seeing it running single core can it be possible that its a CPU problem? or do I still have a memory error? at the moment im running this memtest benchmark with my new pair of 16GBx2 sticks of ram. I will try again to remove the ram stick and replace with my previous stick of 32GBx1 ram stick

 

for reference of the Ram I am using with my Ryzen 3800x CPU

Set 1: 16GBx2 DDR4 2400Mhz Corsair Vengeance LPX

Set 2: 32GBx1 DDR4 2400Mhz Corsair Vengeance LPX

Edited by mankey54
Link to comment
1 hour ago, mankey54 said:

I guess

I guess same too

 

1 hour ago, mankey54 said:

possible that its a CPU problem

Could be, but memory could be still the problem source, as say I don't trust legacy memtest. I face several solid DDR4 memory problem case, thats seldom found on DDR 1/2/3.

 

1 hour ago, mankey54 said:

I will try again to remove the ram stick and replace with my previous stick of 32GBx1 ram stick

Agree

Edited by Vr2Io
Link to comment

@Vr2Io and everyone want to give an update. So my sad saga on trying to fix my issues has finally concluded. Thanks to you Vr2lo you help me determine that the issue is with my CPU. I've since bought a replacement (Ryzen 5 3600x) and my system is back operational has been running now for 24hrs and counting. This made sense that it was a hardware issue since nothing I've done in the last 6 months configuration lead me to believe it was software but I started with software. Trial and error I replaced my memory, then motherboard and issue still was not resolved if anything got worse. once I replaced my CPU it was back to operational.  Since my CPU is under warranty I've started the process to request a new one. Now I have all this extra hardware its now time to build a second system :).

 

Thank you everyone for your support in this. Much appreciated.

  • Like 2
Link to comment
  • JorgeB changed the title to [SOLVED] Unraid crashing upon starting up array

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.