Jump to content

Multiple Issues


Go to solution Solved by JorgeB,

Recommended Posts

Hey I am having multiple issues. My system freezes. Dockers are disappearing. Downloads start out fast and the slow to nothing. Parity started running at normal speed and then slow near nothing. I have deleted the docker .img and redownload my dockers. I have tried a different NIC and then a new motherboard. Last night I replaced one LSI mini SAS card. My case uses a mini SAS connection backplane. I'm not sue how long but it has ran until recently with no issues. I am uploading two diagnostics one as it was running well doing a parity check and the other from this morning when parity has slowed to a crawl.

Thanks in advance for any help provided:

Chris

Parity at a Crawl.png

Pool Disks.png

Main Disks.png

kraken-diagnostics-20240614-0204.zip kraken-diagnostics-20240614-1117.zip

Link to comment
20 minutes ago, JorgeB said:

Unraid driver is crashing, this is almost always a hardware issue, start by running memtest, also update to latest stable.

Hey, thanks for your help. I forgot to say on the last go-round. I have the running (don't know why it didn't work before).

It is already showing errors. I am attaching a photo as it is already showing errors. I am unfamiliar with how the results are read. Does it show if it is one stick give any diagnosis. Will I need to just rule the individual sticks out by running one stick at a time.

memtest.jpg

Link to comment
1 hour ago, JorgeB said:

It doesn't show the affected stick, but you can remove one at time and retest to see if you can find the culprit.

I have pulled my hair out over this. I even re-seated the RAM last night. It seems so obvious I can't believe I didn't consider it. 

The test is still running I have other RAM I can put it in there and test the original RAM to see what's bad. I mean is there any reason to continue the test.

Link to comment
40 minutes ago, ctsdad said:

is there any reason to continue the test

No. You can power off and change sticks or whatever as soon as any errors show up.

 

Don't attempt to run Unraid on a machine that has any memtest errors, you will likely corrupt your data.

 

After you can complete many (preferably at least 12) hours of memtest with no errors shown then you can try to run Unraid.

 

However, not all memory errors are caught by memtest, but it's a good start to weed out obvious problems.

Link to comment
22 minutes ago, JonathanM said:

No. You can power off and change sticks or whatever as soon as any errors show up.

 

Don't attempt to run Unraid on a machine that has any memtest errors, you will likely corrupt your data.

 

After you can complete many (preferably at least 12) hours of memtest with no errors shown then you can try to run Unraid.

 

However, not all memory errors are caught by memtest, but it's a good start to weed out obvious problems.

Thanks I have other RAM that I can install. I will end getting some new  RAM for the Unraid but I will let  it run for a while. I think you guys have helped me find the problem though.

Link to comment

I just wanted to mention  I am on a second parity check with no errors. I am using two sticks of RAM  that I had sitting around. My new 4 stick set will be here tomorrow.  Do you think it's necessary to  test new RAM in a spare PC before committing it to my Unraid server? When this  has completed I will show my original question/ issue as solved. 

Link to comment
3 hours ago, ctsdad said:

Do you think it's necessary to  test new RAM in a spare PC before committing it to my Unraid server?

No, but it is necessary to test it once it's in the Unraid server. Good memory can have errors if the BIOS settings are wrong or out of spec, or the motherboard isn't fully compatible.

 

Your Unraid rig complete as you want to run it must pass a memtest with zero errors for as long as you can stand, preferably 24+ hours. Since Unraid installs and runs in RAM, it's extremely critical that the RAM operation must be flawless.

 

4 hours ago, ctsdad said:

I am on a second parity check with no errors.

Parity check is not a reliable test for good RAM. You could have a bad stick or some fault that doesn't effect the parity check that still causes other issues.

Link to comment
2 hours ago, JonathanM said:

No, but it is necessary to test it once it's in the Unraid server. Good memory can have errors if the BIOS settings are wrong or out of spec, or the motherboard isn't fully compatible.

 

Your Unraid rig complete as you want to run it must pass a memtest with zero errors for as long as you can stand, preferably 24+ hours. Since Unraid installs and runs in RAM, it's extremely critical that the RAM operation must be flawless.

 

Parity check is not a reliable test for good RAM. You could have a bad stick or some fault that doesn't effect the parity check that still causes other issues.

Yeah I must have misrepresented myself. I was just doing the parity check to check parity and look for errors.  Sorry about that. I had thought I would may do a memtest+ on the new RAM in a different PC just to check it for a few hours.

Still, thank you!

Link to comment

Hey Guys I had a good parity check. I am using different RAM and still see some issues popping up in the logs. I am attaching them here and hoping you can tell what you see. The system has stayed up and running it just went through periods of major slow downs in both downloads and accessing the system. Looking at the logs it seems to be read/ write errors  again

 

Thanks, 

Chris

kraken-syslog-20240618-0259.zip kraken-diagnostics-20240617-2259.zip

Link to comment

Hey just to go along with the logs I wanted to add a screen shot. I can go from  76 MB's to pretty much nothing. At the same time I may or may not be able to access the system otherwise like say the main page. Then all of a sudden it will go back to normal and the system will come back quick as can be. Just using this as an example. I am sorry to be such trouble but this is killing me. Also if it seems I am taking a long time to respond it's because I'm having a bad day and can't get around well. I have CMT II disease and I have to sometimes let my wife be my arms and legs.

Screenshot 2024-06-18 12.00.34 AM.png

Link to comment

Btrfs is detecting data corruption on all devices, also multiple apps segfaulting, looks like a RAM problem, since memtest is only definitive if it finds errors, and if you have multiple sticks, try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

Link to comment

Hey this is a completely different set of RAM that came from a working PC since my original set is what we thought started  this discussion because we thought some of those original  RAM sticks were the issue. What are the chances that I took a second set of RAM out and they went bad? It just seems like a strange coincidence.  Also what can I do , (if anything) to address  the Btrfs  corruption? 

Link to comment

(Correction) What are the chances that I took a second set of RAM out of a working PC and they went bad?

7 hours ago, JorgeB said:

Btrfs is detecting data corruption on all devices, also multiple apps segfaulting, looks like a RAM problem, since memtest is only definitive if it finds errors, and if you have multiple sticks, try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

 

Link to comment
Just now, JorgeB said:

See my post above

Yeah I have read it, thank you. I just wanted that sentence I corrected to make since. I got in to big of a hurry. I will go through all of those steps as I cam.  Again thank you.

Link to comment

I had ordered new RAM so I can put back the original 64GB and return what I was using temporarily to it's PC. It didn't get here until today because the mail didn't run yesterday.  Should I rum a memtest on the new RAM in a test unit before putting it in the Unraid and then run the scrub and check for errors?

Also all of your help is appreciated 

Link to comment
On 6/18/2024 at 12:54 PM, JorgeB said:

Reset the stats, run a scrub, confirm no errors, monitor for a few days to see if they come, if they do, there's still a problem.

Do I need to do this separately for each device in the pool or is it considered as one device. I have a scrub running at this moment on the first device since it was the one that showed errors. I have attached what errors were shown before clearing.pre scrub.txt

Link to comment
3 minutes ago, JorgeB said:

Just the pool, it scrubs all devices from that pool.

Ok so it completed with three uncorrectable errors. See attached. It seems to have started with three and ended with three.

Post scrub on first cache-ssd device.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...