System crashing/freezing during mover (after reformatting cache drive)

acosmichippo · September 5, 2020

Hello all,

This week i finally tried to resolve the high writing to my cache drive that a lot of people have been experiencing. I decided to reformat my cache with XFS following this guide:

https://wiki.unraid.net/Replace_A_Cache_Drive

Docker and VMs are all disabled. Everything went fine until the point of running mover the 2nd time to get the data back on the newly formatted XFS cache.

The first attempt I noticed Mover seemed to be running VERY slow (like 30MB/s), but I figured that was just because most of the 400GB is plex metadata. So I let it go overnight and found it unresponsive in the morning, no ping or anything.

I don't keep the server in a place convenient to hook up a monitor, so I had to force a shut down and hooked it up at my desk to check it out. For some reason it would not boot up, it just got stuck on a blinking cursor. This has never been an issue before and I did not change anything in the BIOS, but I decided to check out the boot settings. I put the USB stick first and disables UEFI, and luckily that worked and got back up and running. Not sure why this suddenly was an issue, so i figured it was just a fluke.

After I got it back up I saw it only transferred about 40GB of data (about 10% of the total 400GB cache data). Let the automatic parity check run over 30 hours (12TB parity) which repaired a few errors, and this morning I started Mover again to resume the data transfer to cache, and it has frozen AGAIN. This time is does ping, but Web UI is unresponsive and can't login via SSH. Hooked up a monitor before shutting it off, and there was no video output, so i had to hard reboot again.

luckily a hard reboot worked without getting hung on a blinking cursor like last time. Turns out Mover has now done 300GB of data, so only about 75% of the total 400GB. Gonna let parity check finish again before doing anything else.

Pretty sure it's not an overheating issue or anything. Mover is going quite slow, CPU, RAM, and HDDs are barely utilized at all. I've put the system under MUCH more abuse over the last 4-5 months running all kinds of dockers and a VM. Has anyone seen anything like this before? Any suggestions? I've attached diagnostic files in case anyone has time to take a look.

Thanks for reading and thanks for any help!

Edited January 2, 2022 by acosmichippo
adding more info

JorgeB · September 6, 2020

Syslog starts over after every reboot, so not much to see, you can try this and post that one if it happens again.

acosmichippo · September 6, 2020

whelp, it hung up again after 99% finishing parity check (like 24 hours later). Didn't even get to resume Mover this time. So frustrating.

I have it running memtest now, so we'll see what happens. I took a look at the syslog messages and didn't see much other than an apparent bad sector on sdd. I'm not sure what timezone the syslog timestamps are in, but I'm in eastern US and the server crashed sometime around 14:00. But the syslog entries don't go past 08:16.

If someone has time to take a look that would be greatly appreciated. Thanks!

Sep  5 16:50:08 unraid rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="31634" x-info="https://www.rsyslog.com"] start
Sep  5 19:54:24 unraid webGUI: Unsuccessful login user acosmichippo from 10.0.0.185
Sep  5 19:54:34 unraid webGUI: Successful login user root from 10.0.0.185
Sep  6 00:00:01 unraid Plugin Auto Update: Checking for available plugin updates
Sep  6 00:00:02 unraid Plugin Auto Update: Community Applications Plugin Auto Update finished
Sep  6 00:40:45 unraid webGUI: Successful login user root from 10.0.0.185
Sep  6 03:00:01 unraid Recycle Bin: Scheduled: Files older than 30 days have been removed
Sep  6 03:07:07 unraid kernel: mdcmd (46): spindown 7
Sep  6 04:00:01 unraid Docker Auto Update: Community Applications Docker Autoupdate running
Sep  6 04:00:01 unraid Docker Auto Update: Docker not running.  Exiting
Sep  6 04:00:02 unraid kernel: sd 1:0:2:0: [sdd] tag#818 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep  6 04:00:02 unraid kernel: sd 1:0:2:0: [sdd] tag#818 Sense Key : 0x5 [current] 
Sep  6 04:00:02 unraid kernel: sd 1:0:2:0: [sdd] tag#818 ASC=0x21 ASCQ=0x0 
Sep  6 04:00:02 unraid kernel: sd 1:0:2:0: [sdd] tag#818 CDB: opcode=0x42 42 00 00 00 00 00 00 00 18 00
Sep  6 04:00:02 unraid kernel: print_req_error: critical target error, dev sdd, sector 974765264
Sep  6 04:00:07 unraid sSMTP[13320]: Creating SSL connection to host
Sep  6 04:00:07 unraid sSMTP[13320]: SSL connection using TLS_AES_256_GCM_SHA384
Sep  6 04:00:07 unraid sSMTP[13320]: Authorization failed (535 5.7.8  https://support.google.com/mail/?p=BadCredentials q142sm8074581qke.48 - gsmtp)
Sep  6 06:00:03 unraid emhttpd: shcmd (1986): /usr/sbin/hdparm -y /dev/sdd
Sep  6 06:00:04 unraid root: 
Sep  6 06:00:04 unraid root: /dev/sdd:
Sep  6 06:00:04 unraid root:  issuing standby command
Sep  6 08:16:33 unraid kernel: mdcmd (47): spindown 5
Sep  6 08:16:36 unraid kernel: mdcmd (48): spindown 6

edit: memtest completed 2 passes with no failures, rebooted into safe mode and letting parity check run again.

Edited September 6, 2020 by acosmichippo

JorgeB · September 7, 2020

Nothing being logged suggests a hardware problem, but try booting in safe mode and leave all dockers/VMs disable, if it still crashes like that it's most likely hardware related, if not start turning all the services on one at a time.

System crashing/freezing during mover (after reformatting cache drive)

Recommended Posts

acosmichippo

Link to comment

JorgeB

Link to comment

acosmichippo

Link to comment

JorgeB

Link to comment

Join the conversation