Starting Parity Check instantly reboots the system


Go to solution Solved by itimpi,

Recommended Posts

That was my first idea as well. But in normal mode the machine properly powers on with all Dockers (tried it in both maintenance and normal mode).

Is parity check drawing more power than starting the array and initializing all the Dockers?

Certainly worth testing. Will report back with update asap.

Link to comment

I video recorded the display, booting on normal mode and safe mode, I get the following. Sorry for the poor quality but it's moving really fast, it's the best the camera could capture.

 

It has been running parity checks and even a rebuild recently with the exact same configuration for ever a year. Here are the two last ones, not long ago.
image.thumb.png.08f2faa560a19e8ba648cb201c9019d1.png
 

Any clues?

 

image.thumb.png.6e816ae5d0c63bb6516153c8966de553.png

image.thumb.png.024f16e6e9df511e9daeb89bfc884142.png

Edited by sdfyjert
Link to comment

I have checked the syslog (was already recording it) and there's nothing out of the ordinary there. The messages I see in the display with the pictures posted earlier do not appear in the syslog which means syslog is not started at that point yet.

 

As of today things have taken a turn for the worse. Now it randomly reboots, one of the drives got disabled and marked for errors (it is currently being emulated).

Taking the machine offline is not much of an option right now.

Running check/fix is impossible as it just reboots the system.

Running SMART short tests all drives appear to be fine (extended SMART cannot run as it reboots before they are finished).

 

Currently waiting for the easter days to pass so I can get a new PSU delivered to test it out. If things continue down that path I am considering installing truenas on the same hardware (just different drives) out of curiosity to verify if it is hardware or software related.

 

 

In the meantime, any ideas are welcome.

fingers crossed

Link to comment

I have already tried that, since day 0, the machine would reboot the moment disk check was starting.

I have also tried switching the power cables for the drives to different PSU outputs with no positive results.

 

I was going through the logs, in one of them only I found 2 entries that would normally look suspicious

Apr 10 13:37:53 nas kernel: mpt3sas 0000:01:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM

 

But it only exists in a log from earlier this morning (twice). In all previous days and since that it hasn't appeared again.

The latest thing I did before the issues started appearing was an update on the filemanager (dynamix). I would like to believe it is not related in any way.

 

Edited by sdfyjert
Link to comment

After a lot of further investigation today I am getting more and more convinced the issue is with unraid crashing due to some bad data on a disk. Here's how it went

  1. I loaded a usb-stick backup from a few days ago (before all hell broke loose)
  2. I boot up unraid (array off), the disk that was marked earlier today (but not when the issues started) as dirty and in need of parity fix is now green
    (this sounds like a bug)
  3. I start the array.... all hell breaks loose (reboot)
  4. Safe mode... the same...
  5. I disable everything (docker, VM manager)
  6. I manually disable the drive that needed rebuild (that was marked as dirty days after the reboot issues started).
  7. Start the array - no reboot.

 

Reboot in safe mode

  1. Keep array offline
  2. Start an extended smart test on the dirty drive (now marked as green just by loading an older backup on the usb stick)
  3. Let it run for 10 minutes... no reboot.
    (stopped it there, as with the array running it would have rebooted as all the previous times)

I am unfortunately yearning more and more towards software issue... this is extremely discouraging so far.

Link to comment
5 minutes ago, sdfyjert said:

Finally the new PSU arrived. System now works again as expected. Whatever happened that night the 550W PSU could no longer properly run an 105W on-demand system (that's how much power it draws during rebuild).

 

When a parity check starts, all of the hard drives will spin simultaneously.  Each drive will draw between two and three amperes for a few milliseconds.  If the total ampere draw on the +12V Buss or the peak power rating of the PS is exceeded, the PS will shutdown!  Or the PS overload detection circuitry could have a problem and the circuit 'thought' an over load condition existed when it did not.   (BTW, PS failures are not really that uncommon these days.  They are no longer simply a power brick but a complex device that interacts with the MB to perform power-up, power-down, sleep operation, wake-on-LAN, or any number of other power states that modern PCs want to be in.)

Link to comment
  • 4 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.