Jump to content

Crashes during parity check - HELP!


Recommended Posts

Hi All,

I'm getting to my wits end trying to figure this out. I need some help diagnosing an issue with random crashes, which seem to be occurring during, or after a parity check which takes roughly 12 hours. There was 1 or 2 power outages recently, and this seems to have started after that.

 

No video output from the iGPU once it crashes (running in command line mode) so I don't get an error message. If I run the Unraid GUI, it will show a garbled output. SSH and webserver not accessible. I thought it might be the CPU going idle so I disabled C1 and C3 power states in the BIOS.

 

When this happens I have to power down or restart, which triggers the automatic parity check each time. If I DON'T run the parity check it'll stay up seemingly indefinitely. After 12 hours uptime, I figured I'd run the parity check, to later find it went down sometime during the night. Same thing during the day twice since then.

 

I've got a 1150 Haswell XEON quad core based server running the following:

 

OS: Unraid 6.9.2

Mobo: AsRock Z97M 

Memory: 16GB DDR3 Mushkin @ 1600Mhz (not running X.M.P.)

Drives:

  • (1) 8TB WD Red (parity) < 2 years old
  • (1) 8TB WD Red (data) < 2 years old
  • (1) 250GB Samsung 870 SSD (cache) > 3+ years old

Bootloader: Samsung 32GB flash drive (in 3.0 slot) (2.0 slot made no difference).

Dockers:

  • Plex Media Server (stopped, doesn't alleviate the crash)

VMs: (VM manager currently disabled for troubleshooting)

  • Ubuntu with GPU passthru (ATi 5850) + keyboard and mouse

 

What I've tried:

  1. I ran the plugin: fix-common-problems and discovered some "illegal" filenames which included ">" and some leading or trailing spaces. These weren't important so I deleted them. Came back after the extended test with no errors.
  2. NOT Running in GUI mode (GUI mode crash photo attached).
  3. SMART reported some UMDA CRC errors on the data and parity drives. Cache drive reports no errors. This used to come up, but the system was stable. I've since moved the drives to different SATA ports and not getting this error anymore.
  4. Removed unassigned drives, they were going to be added to the array, but haven't gone forward with this due to the recent crashing.
  5. Moved everything to another computer. It for some reason this other machine can't see the HDDs, even in the BIOS, cache drive shows up just fine. I thought I lost both the data and parity drive after a crash! Sidenote: Does Unraid do something to HDDs that they won't spin up or be recognizable by other machines? Nothing is encrypted. they wouldn't even spin up.
  6. Moved everything back to the Xeon server, cleaned sata ports of dust, checked all connections. Used different ports Unraid sees the drives. PHEW! No more UDMA CRC errors.
  7. Tried BIOS defaults
  8. Tried different BIOS settings, disabled C1, C3 sleep
  9. Tried a dGPU, No Change.
  10. Tuned the parity check to ensure it took a 15 min break every 3 hours. No Change.
  11. Ran Memtest86+ 4 passes and passed with no errors. Tried with and without XMP.
  12. Downgraded the Unraid OS to previously installed 6.8.3. No change.
  13. Upgraded the Unraid OS to 6.9.2. No Change.
  14. Mirrored syslog to flash to see if an error occurs and happens to be written out to get a clue. (Attached!) I didn't see any errors near the end to indicate that anything went wrong during the parity check before it crashed.

 

What to try next:

  • Copy the boot USB to a fresh new USB and re-register
  • Register the new USB and start fresh and RE-BUILD the array and cache
  • Migrate everything off the cache and remove it from the array configuration, possibly replace it.
  • Try another computer and see if it recognizes the drives.
  • Replace the power supply (it's an ATX server rack case, it should be more than adequate).
  • Discard the parity drive/clean it, rebuild the parity from the data1 drive. (eh, sounds risky).
  • Run safe mode, no plugins and just run parity check.

 

What I've ruled out:

- It's not overheating.

- It doesn't SEEM to be the ram.

- SATA connections and power connections are good.

 

TLDR; It crashes sometime during a parity check, but ONLY during the parity check.

 

Syslog attached, maybe someone here can find something that I can't?

 

Thank you in advance!

 

 

IMG_4673.jpg

syslog

Link to comment

I was trying some more troubleshooting last night and the system managed to crash in the BIOS. I'm afraid this CPU/Mobo may have met it's end. See attached video of a crash in command line mode. Yeah, it looks borked.

 

I guess I'll try to move the USB, Cache, Data and Parity drive to another machine. However, the BIOS won't detect the 8TB HDDs. Only the SSD Cache and USB Flash drives. Why would this be?

IMG_4675.MOV.jpg

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...