Jump to content

Server reboots during parity check


jcool
Go to solution Solved by jcool,

Recommended Posts

Hello, My computer reboots during parity check. This started during my vacation, because of the unclean shutdown my server has been running parity checks unsuccessfullyfor about 35 days now.... Not good for my drivers.

 

I had some problems with the files system, but after xfs check and repair everything seems fine.

 

However i still can not find the problem which causes the server to reboot.

 

This is the syslog file from yesterday when i started a parity check again:

 

 

 

Sep 26 04:12:39 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:18:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:24:36 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:30:31 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:36:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:42:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:48:27 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 04:54:38 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:00:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:06:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:12:16 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:18:46 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:24:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:30:42 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:36:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:42:33 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:48:38 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 05:54:23 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:00:19 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:06:42 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:12:25 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:18:30 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:24:23 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:30:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:36:18 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:42:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:48:27 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 06:54:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:00:26 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:06:35 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:12:43 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:18:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:24:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:30:33 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:36:31 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:42:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:48:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 07:54:18 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:00:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:06:46 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:12:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:18:39 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:24:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:30:30 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:36:39 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:42:35 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:48:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 08:54:39 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:00:24 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:06:32 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:12:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:18:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:24:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:30:29 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:36:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:42:32 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:48:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 09:54:23 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 10:00:01 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Sep 26 10:00:01 Tower Parity Check Tuning: DEBUG:   detected that mdcmd had been called from sh with command mdcmd nocheck pause 
Sep 26 10:00:06 Tower Parity Check Tuning: DEBUG:   ...Pause Manual Correcting Parity-Check
Sep 26 10:00:06 Tower Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring
Sep 26 10:00:06 Tower Parity Check Tuning: DEBUG:   Updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
Sep 26 10:00:06 Tower kernel: mdcmd (39): nocheck pause
Sep 26 10:00:06 Tower kernel: 
Sep 26 10:00:06 Tower kernel: md: recovery thread: exit status: -4
Sep 26 10:00:32 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:06:46 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:12:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:18:45 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:24:21 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:30:27 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:36:16 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:42:38 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:48:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 10:54:29 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:00:19 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:06:31 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:12:22 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:18:46 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:24:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:30:26 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:36:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:42:44 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:48:27 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 11:54:36 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:00:32 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:06:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:12:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:18:42 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:24:45 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:30:39 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:36:28 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:42:26 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:48:19 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 12:54:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:00:16 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:06:37 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:12:35 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:18:40 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:24:35 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:30:34 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:36:47 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check paused
Sep 26 13:40:28 Tower kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022
Sep 26 13:40:28 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Sep 26 13:40:28 Tower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Sep 26 13:40:28 Tower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Sep 26 13:40:28 Tower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

 

Could anyone help my with this problem? I am at a loss haha

Link to comment

That log snippet suggest that the reboot happened with nothing being logged.    This points towards it being some sort of hardware issue.   The obvious one that springs to mind would be a power/PSU related issue or perhaps a problem with cooling.

 

BTW:   I can see that you have the Parity Check Tuning plugin installed.   It would be less verbose if you set the logging level in the plugin to be the basic level (although the debug level can be useful if you want more detail on the status of a running parity check). 

Link to comment

tower-diagnostics-20230926-1732.zipIm sorry, your are right i forgot: this is the diagnostics file. 

 

I set it to debug because of this problem, but it gives no relevant information you are right.

As for the syslog, indeed i see nothing which indicates a software problem, however it is really strange that after the file system errors started, suddenly the server reboots. Now that the file system errors are fixed i still have the problem. 

 

Cooling seems fine, temps of the cpu do not go beyond 60 while in parity check, hard drives temperature was a concern of mine, as one drive would reach 47 degrees max. However with an extra fan the temp does not go beyond 28 degrees, and still the server has rebooted as per the last log file.

 

It also happens after about 50% of the parity check, as the parity drive is 18TB, i am having a hard time believing that for the first 12 hours or so, the PSU works fine and the server does not get too hot, and that at the last 50% the psu suddenly stops working.

 

This sysfile is logged to USB so should contain all the information, i also logged to an external server where i also would not get any other information in the log.

Edited by jcool
Link to comment
  • Solution

SOLVED: Alright, i wil give an update. I have been kinda stupid, what was happening was that i put my server in another place  2 months ago, and accidentaly put it on a power outlet of mine that has a built in relay. That relay has a function for power cycling, every 23 hours it power cycles, but because the relay did not always have wifi, it only did it sometimes.

 

Topic can be closed haha.

 

 

  • Like 1
  • Haha 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...