Unstable, Dockers and VMs


Recommended Posts

Hi everyone,

 

so im propperly at a loss here,

 

i built my unraid box when borris locked us down following spaceinvader ones guides, absolutly brilliant, im a tech guy anyway so love this but linux & containers, docker and the like are quite new to me, ive worked with VM's before but im a windows guy.

 

with that said lets get down to the problem and how it started, this machine ran perfectly for around 2 years with zero downtime as such, ive been running

 

Swag

Nextcloud,

MariaDB

Plex

Emby

Sonarr

Radarr

SabNZB

Shinobi

Youtube DL

and probably a few more i cant remember now,

 

as well as a Windows 10 VM

 

the issues im having are:

within the VM Chrome will show the SNAP page stating memory access violation,

my Docker containers crash i.e. Emby will usually stay logged in untill it crashes then it askes me for a username and password and will not accept my details untill i restart the container, having said that i get code 403 when trying to restart the container and this is only resolved by a system reboot.

 

system is a Ryzen 3600x

24gb Ram

4x 6tb HDDs (1 of them for parity)

1x 2tb WD Blue NVME SSD ( houses my containers / App Data and VM / Domains folders ) also i was using it as a cache drive, i did wonder if this was a SSD Failure so the 2tb NVME is New i was running a 1TB before.

 

anyway here when the issue started, as stated above ive been following spaceinvader ones guide (what a bloke for these guides) and i installed Tdarr and started to transcode my sizable plex library into H.265, all was going well and i managed to convert about half if not 2/3 of my library, then i added a gpu to speed things up as i was just allowing the cpu to chug through it when i wasnt using the machine as the helm as such, this is where the problem started, i filled up my cache drive and thus crashed my machine, got back rebooted realised what happened though oh silly me ran the mover and thought all was good. my machines never been stable since,

 

ive now removed the GPU for reference it was  GTX1080 3gb, and im still using the RTX3070 i has in for the windows VM,

 

ive attached the diagnostics for you to take a look through, i have 2 unhappy ish drives but they were thumbs down when i started the transcode, 1 of them has simply hit its lifetime timer but i cant see these giving me all this trouble when the stuff im running mainly sits on the SSD???

any help would be greatly appreciated and if ive forgotten anything shout me and ill add details as we need them, id love to get this stable again as its so useful.

 

thankyou everyone in advance, cant wait to see what you all think

voyager-diagnostics-20220710-1249.zip

Link to comment

Assuming that you've already handled this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#comment-819173

 

 

You definitely want to run a memtest from the Boot Menu (if you're booting via UEFI, you will need to temporarily switch to Legacy mode for Memtest to work or alternatively create a new bootstick via https://www.memtest86.com/

 

Additionally, if you're going to mix and match memory (which you are), the most stable system is one which

  • Has all matching sticks
  • Has matching sticks installed in pairs (this also has a massive performance boost)
  • If unable to match sticks, then you must ensure the the CL timing is identical between the sticks

 

Link to comment

make sure you run a very intensive memtest (build your own stick, the UNRAID version is outdated, not UEFI and does not recognize modern Ryzen CPUs).

 

Many Ryzen Boards have memory problems, even if you have bought "3200 approved" memory this does not mean that you are also be able to use this speed.

The more slots are occupied, the slower the speed has to be. Pushing the voltage above 1,35V is dangerous too, it will kill your memories within a few years (wearing them out).

 

I, for instance,  get "only" 3000 with 1,35V out of my "3200 certified" 128Gb bars. It took me a week of testing to come to this value (start with 3200, cancel if error shows up, go down to 2400, all well, raise to 2800 and so on...) One run can take the whole night depending on the amount of memory and the number of cores of your processor.

 

But stay cool! do it until you have got an absolutely rock solid combination. Else randome errors like those you have now will drive you crazy. ANYTHING may happen with bad ram...

 

Link to comment

Hi Guys,

 

And thabkyou so much for your replys.

 

So due to family issues I've only just been able to run mem test through my server and I'm ashamed to say as I used to be a computer engineer it looks very much like it was memory Squid thanks so much looks like my corsair vengeance sticks both have died, the one stick just instantly threw errors and hundreds of them. So I popped that out used the same slot and run the second stick that only threw 1 or 2 errors but still considered dead in my mind.

 

Now I'm down to 1x 8gb adana stick not idea when I need to run a vm but at least I know the issue now. Can't thank you enough,

 

I may come back to you shortly with some other questions as now I need to repair my plex docker contain database as it keeps telling me its corrupt.

 

And is there a way within unraid I can scan for corrupted data?

 

Thankyou all so much 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.