sdfyjert Posted April 6, 2023 Share Posted April 6, 2023 Hi guys, out of the sudden, every time I try (or the system tries) to initiate parity check the server goes into hard reboot. This happens in both normal and safe mode. I checked logs but couldn't spot anything. I am at a loss here and unraid is an integral part of my infrastructure (lab, home, work) I am open to suggestions and ideas. nas-diagnostics-20230406-1253.zip Quote Link to comment
Solution itimpi Posted April 6, 2023 Solution Share Posted April 6, 2023 I would think the most likely culprit is the power supply as that is when the current draw is likely to be at its max. 1 Quote Link to comment
sdfyjert Posted April 6, 2023 Author Share Posted April 6, 2023 That was my first idea as well. But in normal mode the machine properly powers on with all Dockers (tried it in both maintenance and normal mode). Is parity check drawing more power than starting the array and initializing all the Dockers? Certainly worth testing. Will report back with update asap. Quote Link to comment
sdfyjert Posted April 6, 2023 Author Share Posted April 6, 2023 (edited) I video recorded the display, booting on normal mode and safe mode, I get the following. Sorry for the poor quality but it's moving really fast, it's the best the camera could capture. It has been running parity checks and even a rebuild recently with the exact same configuration for ever a year. Here are the two last ones, not long ago. Any clues? Edited April 6, 2023 by sdfyjert Quote Link to comment
Frank1940 Posted April 6, 2023 Share Posted April 6, 2023 Setup the Syslog Server to capture what is going on. See here for instructions: Use the Mirror to Flash setting as you can create the problem in a few minutes. Quote Link to comment
sdfyjert Posted April 10, 2023 Author Share Posted April 10, 2023 I have checked the syslog (was already recording it) and there's nothing out of the ordinary there. The messages I see in the display with the pictures posted earlier do not appear in the syslog which means syslog is not started at that point yet. As of today things have taken a turn for the worse. Now it randomly reboots, one of the drives got disabled and marked for errors (it is currently being emulated). Taking the machine offline is not much of an option right now. Running check/fix is impossible as it just reboots the system. Running SMART short tests all drives appear to be fine (extended SMART cannot run as it reboots before they are finished). Currently waiting for the easter days to pass so I can get a new PSU delivered to test it out. If things continue down that path I am considering installing truenas on the same hardware (just different drives) out of curiosity to verify if it is hardware or software related. In the meantime, any ideas are welcome. fingers crossed Quote Link to comment
Frank1940 Posted April 10, 2023 Share Posted April 10, 2023 Start in the Safe Mode-- a boot option at time of startup. Quote Link to comment
sdfyjert Posted April 10, 2023 Author Share Posted April 10, 2023 (edited) I have already tried that, since day 0, the machine would reboot the moment disk check was starting. I have also tried switching the power cables for the drives to different PSU outputs with no positive results. I was going through the logs, in one of them only I found 2 entries that would normally look suspicious Apr 10 13:37:53 nas kernel: mpt3sas 0000:01:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM But it only exists in a log from earlier this morning (twice). In all previous days and since that it hasn't appeared again. The latest thing I did before the issues started appearing was an update on the filemanager (dynamix). I would like to believe it is not related in any way. Edited April 10, 2023 by sdfyjert Quote Link to comment
sdfyjert Posted April 10, 2023 Author Share Posted April 10, 2023 After a lot of further investigation today I am getting more and more convinced the issue is with unraid crashing due to some bad data on a disk. Here's how it went I loaded a usb-stick backup from a few days ago (before all hell broke loose) I boot up unraid (array off), the disk that was marked earlier today (but not when the issues started) as dirty and in need of parity fix is now green (this sounds like a bug) I start the array.... all hell breaks loose (reboot) Safe mode... the same... I disable everything (docker, VM manager) I manually disable the drive that needed rebuild (that was marked as dirty days after the reboot issues started). Start the array - no reboot. Reboot in safe mode Keep array offline Start an extended smart test on the dirty drive (now marked as green just by loading an older backup on the usb stick) Let it run for 10 minutes... no reboot. (stopped it there, as with the array running it would have rebooted as all the previous times) I am unfortunately yearning more and more towards software issue... this is extremely discouraging so far. Quote Link to comment
sdfyjert Posted April 14, 2023 Author Share Posted April 14, 2023 Finally the new PSU arrived. System now works again as expected. Whatever happened that night the 550W PSU could no longer properly run an 105W on-demand system (that's how much power it draws during rebuild). Thank you all for your help and advice. Quote Link to comment
Frank1940 Posted April 14, 2023 Share Posted April 14, 2023 5 minutes ago, sdfyjert said: Finally the new PSU arrived. System now works again as expected. Whatever happened that night the 550W PSU could no longer properly run an 105W on-demand system (that's how much power it draws during rebuild). When a parity check starts, all of the hard drives will spin simultaneously. Each drive will draw between two and three amperes for a few milliseconds. If the total ampere draw on the +12V Buss or the peak power rating of the PS is exceeded, the PS will shutdown! Or the PS overload detection circuitry could have a problem and the circuit 'thought' an over load condition existed when it did not. (BTW, PS failures are not really that uncommon these days. They are no longer simply a power brick but a complex device that interacts with the MB to perform power-up, power-down, sleep operation, wake-on-LAN, or any number of other power states that modern PCs want to be in.) Quote Link to comment
bucky2076 Posted August 24, 2023 Share Posted August 24, 2023 thanks for this topic and solution... helped me deal with similar problems. I bought a new power supply... but then threw in new MB, cpu memory, etc... just for kicks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.