Jump to content

unraid hard locks for unknown reason


Recommended Posts

Hello,

 

i have been dealing with a instabillity that i cannot put my finger on on the exact cause. the system seems to crash at non-descript moments. it can be anything between hours or days. cpu load does not seem to be a factor. changing cpu features does not seem to have any effect.

so far to exclude the "usual suspects" i have stripped the gpu out (and disabled VM manager) and tried way too many permuations of ram configs and i went down from 3 sticks of 8 gigs to 1 stick in an attempt to lower the amount of crashes i am currenty running as bare as possible. it does not seem to have much effect as the time between a crash can be days.

 

the result of the instabillity is a hard lock to such an extent only a power cycle (that i thankfully can do via ipmi) will bring it back. the ipmi logs do not show any anomalies or recorded any power supply issues (wich is a high end seasonic).

the lack of change in changing hardware around in the frequency of crashes does force me to look at the remaining constant: unraid. but i do not have the skill or knowledge to see or find any problems in the logs. i have attached the mandatory files and i hope someone can take a look at it and find some reason the system might crash.

 

i would love to be able to fault find in a more targeted way by either being able to exclude unraid as a cause or find some issue.

 

ps: could a dying usb stick cause this? i have a brand new SLC based industrial usb stick waiting to be used but the year between licence transfers has not been passed yet so i cant replace the stick yet.

 

nasi-diagnostics-20220222-0025.zip

Edited by thedutchguy
Link to comment
4 hours ago, JorgeB said:

 

that is a nice find. but that does not explain why this issue has only sprang up recently and has been getting worse in the last ~2 months when i have been running 6.9 and later 6.10 without issue.

 

i have not excluded an usb stick fault (again) so far but its impossible to test as transferring the flash data invalidates the registration key and disables the array from running making the server useless (upsetting the family) and starting with a blank stick with a trial key makes me lose all the docker and share data wich also upsets the family.

 

i did do a "surface" test on the usb stick (a sandisk extreme somethingsomething) and there are some speed hiccups when reading indicating the nand on the stick might be on its way out but without the abillity to actually replace the stick without contacting support to (again) prematurely replace the GUID on my licence within the year its not a cause i can diagnose.

Edited by thedutchguy
Link to comment
  • 2 weeks later...

update:

 

i have found the core issue after not seeling anything on the logs. it would appear the usb controller on the motherboard was slowly dying. i came to this as the system would not respond to the virtural keyboard on ipmi and neither would on most of the usb ports but worked fine on a separate 4 port usb pci-e card.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...