Jump to content

Array stopping (Crashing?)


vmax5000

Recommended Posts

I am new to unraid.  I have an issue where the array just suddenly stops.  It usually happens after several days and usually when I am using the mover to move data from the cache to the array at least it seems that way.  I looked at the mce logs and to me it looks like it might be a CPU issue, but I'm not that good a linux.  I am attaching the logs as outlined in the need help post.  I also have one drive that seems to disappear when I shut down the system and it takes several starts to get the system to see it again. The drive is brand new and it passes all tests, so I'm not sure if that drive is causing an issue or not, but from the logs I don't see any drive issues. Any help would be appreciated. 

This is what I see in the system logs, in case they don't show in the download:

Dec 13 02:18:51 Ultimate kernel: mce: [Hardware Error]: Machine check events logged Dec 13 02:18:51 Ultimate kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 1: bf80000000000124 Dec 13 02:18:51 Ultimate kernel: mce: [Hardware Error]: TSC 0 ADDR 42f883e40 MISC 86 Dec 13 02:18:51 Ultimate kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1576232296 SOCKET 0 APIC 4 microcode 43

 

Thanks in advance..

Randy

 

ultimate-diagnostics-20191213-1456.zip

Link to comment

I can now confirm that the system will die when I run the mover. I am trying to move 2TB of data at a time to my unraid plex, and I think my system is running out of memory and just dies.  I also noticed that even without the mover running the system stats shows that almost all 16gb of memory is in use, see attachment. Is it normal for unraid to use all that memory essentially idling? I read a post talking about tip and tweaks that discuss changing vm.dirty_background_ratio and vm.dirty_ratio, but I have no idea what to set them at or even if I should be playing with these settings since it says don't change these if you don't understand them.

If I don't use the mover, I don't have any problems.

Any suggestions? 

I tried to set up the syslog server as per the post above but I don't seem to be getting any data using the first option to write to the flash drive.

Thanks again,

Randy

Unraid.jpg

Link to comment
9 hours ago, vmax5000 said:

I am trying to move 2TB of data at a time to my unraid plex

Why are you trying to cache so much?

 

I always recommend NOT caching the initial data load. Cache and Mover just get in the way since cache won't have the capacity. There is no way Mover can move to the slower array as fast as you can write to the faster cache. And if you are trying to do both at once they will just be competing for the disks. Mover is intended for idle time.

 

Many of us including me don't even bother to cache user share writes. Most of my writes are from scheduled backups or queued downloads, so I am not waiting for them to complete anyway. I only use cache for my dockers (no VMs for me), for temporary storage for DVR performance, and for copies of some frequently accessed files so my array won't have to spin for those.

 

Unraid is linux, and linux will use free RAM for I/O buffering. All that space on the graph described as "Cached" in the legend, between "Used" and "Free" is just that I/O buffering, and completely normal.

Link to comment

If you're moving a ton of data from another system to unraid i recommend just setting all drives to never spin down and then md_write_method to reconstruct write, also disable cache for the share you are moving the data to.

 

This goes much quicker and smoother then having to fill the cache, then initiate the mover, then fill the cache and repeat the process.

 

once you are done you can re-enable spin down and set the md_write_method back to auto.

Edited by je82
  • Like 1
Link to comment
2 hours ago, vmax5000 said:

Thanks for your help!  I will give it a try, but I will have to wait until the parity check finishes.

You were trying to do a parity check and load all of that data at the same time? Parity check or sync uses all of the disks in the array so that would also be in competition with moving or writing to the array.

 

Some people even wait until after the initial data load to install parity so that big write won't be slowed down by parity.

 

Good advice from @je82

Link to comment

Actually I was not doing the parity check at the time, but when the array stopped on it's own and I restarted it that's when the parity check started. So now I am waiting for the parity check to finish before I send more data to the array. I also noticed that the cache drive said there was still data so I can't turn it off until the parity check is finished and then I will use the mover to finish what's left. After that I will turn off the cache option and just move the data directly to the array.  I really appreciate all the help, I'm new to unraid and I'm not very familiar with linux, so thanks again for all the help and suggestions. I will get there eventually!

 

Link to comment
On 12/15/2019 at 8:28 AM, je82 said:

If you're moving a ton of data from another system to unraid i recommend just setting all drives to never spin down and then md_write_method to reconstruct write, also disable cache for the share you are moving the data to.

 

This goes much quicker and smoother then having to fill the cache, then initiate the mover, then fill the cache and repeat the process.

 

once you are done you can re-enable spin down and set the md_write_method back to auto.

 

On 12/15/2019 at 11:21 AM, trurl said:

You were trying to do a parity check and load all of that data at the same time? Parity check or sync uses all of the disks in the array so that would also be in competition with moving or writing to the array.

 

Some people even wait until after the initial data load to install parity so that big write won't be slowed down by parity.

 

Good advice from @je82

 

On 12/15/2019 at 8:28 AM, je82 said:

If you're moving a ton of data from another system to unraid i recommend just setting all drives to never spin down and then md_write_method to reconstruct write, also disable cache for the share you are moving the data to.

 

This goes much quicker and smoother then having to fill the cache, then initiate the mover, then fill the cache and repeat the process.

 

once you are done you can re-enable spin down and set the md_write_method back to auto.

Hey, I just wanted to say thanks for your suggestion, that really makes a difference and I get full speed copying to the array. I haven't had any more issues!  Is it a good idea to change it back after copying everything or can I just leave that setting?

Thanks again

Randy

Link to comment

I added more drives to the system, and changed the cpu cooler to a low profile cooler to be able to close the case. Since the changes the server is randomly rebooting. I turned on the syslog and downloaded it after the system came backup. I changed a drive in the array from a 3TB to a 6TB and the parity check has not been able to finish because it keeps re-booting. I was also trying to preclear 3 drives. Does anyone have any ideas as to what to check? I also checked to see if I could use ECC ram and as far as I can tell my mb doesn't support it. Any help would be appreciated...

 

Thanks.

Randy

ultimate-syslog-20191217-1623.zip

Link to comment
31 minutes ago, vmax5000 said:

I changed a drive in the array from a 3TB to a 6TB and the parity check has not been able to finish because it keeps re-booting.

It isn't doing a parity check, it is rebuilding the replaced disk4. You should do a noncorrecting parity check after rebuild completes to make sure everything is good. If you can't get that far do a memtest (on the boot menu).

 

34 minutes ago, vmax5000 said:

I was also trying to preclear 3 drives.

Quit until you get rebuild complete and things are stable. In fact you might even remove the preclear plugin.

 

28 minutes ago, vmax5000 said:

Does anyone have any ideas

Quit changing and adding until you get stable. You are trying to do too much at once. Slow down, take things one at a time, make sure that one thing is working well before trying anything else.

Link to comment

I found and fixed the issue I was having. To make a long story short I found a bad SATA cable on one of my parity drives, even though the cables were all new, one would lose connection to the drive intermittently causing the raid to crash or reboot. I replaced the faulty cable and now the system is perfectly stable.

Thanks for the help.

Randy

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...