Machine restarted while copying 350GB of files


Trylo

Recommended Posts

Hi!

 

Last night my Unraid machine restarted and "Fix Common Problems" gave me this message:

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged.

 

Please let me know if it's something serious.

 

Diagnostics attached.

 

Thank you in advance!

 

nas-diagnostics-20180918-1154.zip

Link to comment

One thing that stands out is that you have a 250GB SSD drive as your cache drive.   You then attempted to copy 350GB of files to the array.  I am guessing that you had the User Share that was the target for this copy assigned to use the cache drive!   GUESS WHAT!!!!  It filled up.   While this should not have closed caused a restart, you should probably not use the cache drive for this User Share until this copy is finished.   

Edited by Frank1940
Link to comment

There is also this series of events during the boot process:

 

Sep 18 00:14:47 NAS kernel: mce: [Hardware Error]: Machine check events logged
Sep 18 00:14:47 NAS kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: fa000010000b0c0f
Sep 18 00:14:47 NAS kernel: mce: [Hardware Error]: TSC 0 MISC d012000001000000 
Sep 18 00:14:47 NAS kernel: mce: [Hardware Error]: PROCESSOR 2:660f01 TIME 1537222472 SOCKET 0 APIC 0 microcode 600610e
Sep 18 00:14:47 NAS kernel: Performance Events: Fam15h core perfctr, AMD PMU driver.
Sep 18 00:14:47 NAS kernel: ... version:                0
Sep 18 00:14:47 NAS kernel: ... bit width:              48
Sep 18 00:14:47 NAS kernel: ... generic registers:      6
Sep 18 00:14:47 NAS kernel: ... value mask:             0000ffffffffffff
Sep 18 00:14:47 NAS kernel: ... max period:             00007fffffffffff
Sep 18 00:14:47 NAS kernel: ... fixed-purpose events:   0
Sep 18 00:14:47 NAS kernel: ... event mask:             000000000000003f
Sep 18 00:14:47 NAS kernel: Hierarchical SRCU implementation.
Sep 18 00:14:47 NAS kernel: smp: Bringing up secondary CPUs ...
Sep 18 00:14:47 NAS kernel: x86: Booting SMP configuration:
Sep 18 00:14:47 NAS kernel: .... node  #0, CPUs:      #1 #2 #3

I am not sure what is causing it.  Hopefully, someone else will have a better feel for what is happening here...

 

One thing, you can do is to Google      mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: fa000010000b0c0f       and look at the results.  You could also try the other mce errors with Google and see if there is some commonality.  

Link to comment
2 hours ago, Trylo said:

Yeah, it filled up, that doesn't surprise me :) After it fills up then it copies the rest of files straight to the HDD.

I have done this a few times and it never resulted in a restart.

You should set Minimum Free on the cache drive to larger than the largest file you expect to write to cache. Then if you are writing to a cached user share, it will go ahead and overflow to the array before you actually fill up cache and get an error. Minimum Free for cache is in Global Share Settings.

 

And of course, each user share has its own Minimum Free setting which should be set to larger than the largest file you expect to write to the user share. Unraid has no way to know how large a file will become when it chooses a disk to write it to. If there is less than minimum free on a disk, it will choose another.

 

None of this should have anything to do with your restart though. Have you done a memtest?

 

 

Link to comment
25 minutes ago, trurl said:

Maybe or maybe not if you have IPMI

Just for reference, anything you can do with a monitor/keyboard you can do with IPMI, IMO one of the best things ever for servers/NAS, once I got my first IPMI board and got used to it took very little time until I replaced all my other servers with IPMI enable boards.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.