SOLVED: Strange Parity Check/Rebuild Error


Recommended Posts

I'm new to unRAID but have read documentation and am continually scratching my head about this issue. Just bought a pro license yesterday but am very concerned about what's been happening.

 

So, this all started when I started a parity check one evening and woke up the next morning to my Parity 1 drive moved to "Unassigned Devices", but still also listed as a spun-up Parity drive but reading 22 trillion read and write errors. This drive has 32 days of uptime and is brand new, and given the information at the time I thought this meant my 2nd Parity drive was bad given its already questionable SMART status, so I decided to order a new drive and replace it. I stopped everything, powered off, checked connections, powered on, and found that my unRAID flash drive had become corrupted and/or the filesystem was totally wrecked. I replaced this with a brand new flash drive, was able to recover my config and copy it over, and boot back into unRAID to find that 2 DATA drives were now labeled as "missing", and did not show up in terminal using lshw -c disk. Parity 1 was also unassigned, but was visible so I reassigned it. I powered off again, checked connections again, and rebooted to find one data disk to still not work. Easy enough to deal with, I thought, since that one data disk didn't have any data on it. I replaced that data disk, and now I'm back to pretty much the same scenario from the start:

 

I am doing a parity rebuild/check with a NEW 2nd Parity drive, and a NEW data drive, and the rebuild/check has run all day until now without issue, but as it approached 1.91TB in the parity check (which it should be stopping at since the data drives in my array all have a writable volume of 1.81TB) , Parity 1 suddenly listed 22 trillion read and write errors again, now shows no SMART temperature, and is listed when you drill down as a SPUN DOWN drive. Mind you, this parity check is still RUNNING as we speak, and reports no issues, and the system logs show absolutely nothing. It's completely silent.

 

I've checked all drives for SMART status, checked my drive configuration, and am completely at a loss. Something is really messed up and I don't know what it is. This is running on a pre-owned but well-conditioned SuperMicro X9SRH-7F and E5-2630Lv1 with 96GB of Samsung ECC DDR3-1066MHz RDIMMs. This problem seems to keep occurring at a particular point, and is not intermittent which leads me to believe it's not a hardware issue. See photos for more info on how my array is situated and the current status.

 

Screen Shot 2018-04-18 at 2.23.51 PM.png

Screen Shot 2018-04-18 at 2.25.03 PM.png

Edited by Distinguished
Link to comment

Your parity disk was originally given the identifier /dev/sdm but it dropped off line and reconnected again as /dev/sdp. It looks like it has an intermittent connection so shut down and check both its SATA and power cables. If you're using some sort of drive cage then put it in a different bay.

 

I'm not sure why you think your syslog is not showing any errors when it's full of this:

Apr 18 07:30:02 jpus001-pgh1 kernel: md: disk0 read error, sector=27516368
Apr 18 07:30:02 jpus001-pgh1 kernel: md: disk0 read error, sector=27516376
Apr 18 07:30:02 jpus001-pgh1 kernel: md: disk0 read error, sector=27516384

 

Link to comment
8 hours ago, Distinguished said:

22 trillion read and write errors

 

3 hours ago, Distinguished said:

I've been seeing a lot of that lately

 

The two are the same thing. (Actually, it's "only" 40 million errors but even a single one is unacceptable.) Disk0 refers to the (first) Parity disk, BTW. Do you have notifications enabled (Settings -> Notification Settings)?

 

A couple of other pointers: your server was building Parity2 (not checking parity) and rebuilding Disk2. That's fine, it can do both simultaneously but in order to do so it needs to be able to read all the other disks so when Parity1 (just labelled Parity) fell off line both processes failed. The building of Parity2 won't stop when it reaches the 2 TB mark but will continue (filling the remainder of the disk with zeros) all the way to 3 TB. The other disks will no longer be involved in the calculation at the point and, depending on your settings, will probably spin down. I suggest you uninstall the Preclear plugin, at least until you've fixed this problem, because it spams the syslog and makes it difficult to read.

Edited by John_M
typo
Link to comment

Not to hijack the thread, but I replaced a failing drive (1.5TB or so) with a new 8TB drive. It is rebuilding the new drive, and in the process game read errors on the first disk (disk 1 - 8TB) and a different disk (disk 5) (1TB, 5+ years old). I'm not sure if these read errors were corrected?

Edited by daze
Link to comment
1 hour ago, daze said:

Not to hijack the thread, but I replaced a failing drive (1.5TB or so) with a new 8TB drive. It is rebuilding the new drive, and in the process game read errors on the first disk (disk 1 - 8TB) and a different disk (disk 5) (1TB, 5+ years old). I'm not sure if these read errors were corrected?

 

Start your own thread and post your diagnostics.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.