Unraid crashed during parity check - disk failed?

pickthenimp · March 1, 2017

I ran my monthly parity check last night and woke up to an unresponsive unraid. I got an alert "Disk 4 in error state" and another ""Array has 2 disks with read errors" around 1am.

I rebooted, and was greeted with a red X over disk 4 "Disk is disabled, Contents emulated"

I ran a smart test on disk4 with no errors. What are my next steps? Diagnostics attached.

Thanks in advance.

nas-diagnostics-20170301-0929.zip

trurl · March 1, 2017

You should have taken the diagnostic before rebooting. Do you know whether it was a correcting parity check?

Also, the version of unRAID you are running has a bug that prevents us from getting the SMART for any of your disks in the diagnostics. Go to Settings - Disk Settings and turn off Autostart. Then upgrade and get another diagnostic.

pickthenimp · March 1, 2017

Thanks for your response. My machine was completely locked up and couldn't grab a diagnostics unfortunately.

It was NOT a correcting parity check, just a default monthly.

I am running an extended smart test right now on disk 4. It's been running for 4 hours now and at 50%. Should I let it finish or cancel, upgrade and run diagnostics again?

Thanks

pickthenimp · March 2, 2017

FYI, i let the extended test finish and it passed. Attached is the smart test result for that drive.

Was this just a fluke and I should rebuild the drive?

nas-smart-20170301-1940.zip

pickthenimp · March 2, 2017

So, I got anxious and went out and purchased a new 2TB drive to replace disk4. Things locked up after I started the rebuild but I copied off the syslog. See attached.

Any guidance would be appreciated.

**Edit Unraid became responsive again and the drive seems to be rebuilding, however I am all of the sudden missing my TV and Movies shares?? Not sure what is going on..

About 2 minutes into the rebuild I got a few alerts:

Event: unRAID array errors
Subject: Warning [NAS] - array has errors
Description: Array has 1 disk with read errors
Importance: warning

Disk 3 - WDC_WD20EARS-00MVWB0_WD-WMAZA3690347 (sdj) (errors 48)

Event: unRAID array errors
Subject: Warning [NAS] - array has errors
Description: Array has 2 disks with read errors
Importance: warning

Disk 1 - WDC_WD20EARS-00MVWB0_WD-WMAZA3638502 (sdh) (errors 1)
Disk 3 - WDC_WD20EARS-00MVWB0_WD-WMAZA3690347 (sdj) (errors 52)

syslog.txt

Edited March 2, 2017 by pickthenimp
Some shares are now missing

SSD · March 2, 2017

Sounds like bad cabling

pickthenimp · March 2, 2017

1 hour ago, bjp999 said:

Sounds like bad cabling

I double checked all my sata cables and they are snug

Here is the latest: Upgraded to 6.3.2. Rebuilt disk 4 with a new drive. Reboot and missing shares finally came back

Started a parity sync (without writing corrections) and it started out painfully slow. My error count kept going up. I finally stopped it after find 666670 errors.

Latest diagnostics attached after stopping parity sync.

Bad mobo?

nas-diagnostics-20170302-0652.zip

trurl · March 2, 2017

No SMART for disks 1,3,4. Definitely some sort of hardware problem making these disconnect. Are they on the same controller?

pickthenimp · March 2, 2017

No SMART for disks 1,3,4. Definitely some sort of hardware problem making these disconnect. Are they on the same controller?

Yes these are all on the same onboard controller.

But seems odd I can access all of those disks fine?

Sent from my iPhone using Tapatalk

trurl · March 2, 2017

Your syslog is full of read errors on these disks. Maybe they are working intermittently but something needs to be fixed, and I'm not sure I would trust the data on the rebuilt disk4 either. Best if you avoid correcting parity until this is cleared up.

9 minutes ago, pickthenimp said:

seems odd I can access all of those disks fine?

How are you accessing the disks?

pickthenimp · March 2, 2017

Your syslog is full of read errors on these disks. Maybe they are working intermittently but something needs to be fixed, and I'm not sure I would trust the data on the rebuilt disk4 either. Best if you avoid correcting parity until this is cleared up.

How are you accessing the disks?

Via unc path directly to the disk from another machine.

Sent from my iPhone using Tapatalk

SSD · March 2, 2017

It looks like the last three drives (attached to the LSISAS1068 controller) have dropped offline.

Quote

Mar 2 06:34:21 nas kernel: scsi host10: ioc1: LSISAS1068 B1, FwRev=01160100h, Ports=1, MaxQ=286, IRQ=36
Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x455f3d3ee1bfa990
Mar 2 06:34:21 nas kernel: scsi 10:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5
Mar 2 06:34:21 nas kernel: sd 10:0:0:0: Attached scsi generic sg7 type 0
Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 2, phy 2, sas_addr 0x3c296c3680865a58
Mar 2 06:34:21 nas kernel: scsi 10:0:1:0: Direct-Access     ATA      ST2000DM006-2DM1 CC26 PQ: 0 ANSI: 5
Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Write Protect is off
Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Mode Sense: 73 00 00 08
Mar 2 06:34:21 nas kernel: sd 10:0:1:0: Attached scsi generic sg8 type 0
Mar 2 06:34:21 nas kernel: sd 10:0:1:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Mar 2 06:34:21 nas kernel: mptsas: ioc1: attaching sata device: fw_channel 0, fw_id 3, phy 3, sas_addr 0x455f3d44d9bdad95
Mar 2 06:34:21 nas kernel: sd 10:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 2 06:34:21 nas kernel: scsi 10:0:2:0: Direct-Access     ATA      WDC WD20EARS-00M AB51 PQ: 0 ANSI: 5

These are disks 1, 3, and 4. All of those disks are reporting problems.

Quote

...

Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30376
Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30376
Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30376
Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30384
Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30384
Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30384
Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:0:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:1:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: sd 10:0:2:0: rejecting I/O to offline device
Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30392
Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30392
Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30392
Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30400
Mar 2 06:51:06 nas kernel: md: disk3 read error, sector=30400
Mar 2 06:51:06 nas kernel: md: disk4 read error, sector=30400
Mar 2 06:51:06 nas kernel: md: disk1 read error, sector=30408

...

My guess is that the controller either dropped offline or is failing.

Johnnie_Black may be good to weigh in here.

pickthenimp · March 2, 2017

Thanks for digging in.

As a quick solution, would purchasing a 4 port Sata controller and moving these 3 disks off the onboard controller solve my problem?

Looking at buying this: https://www.amazon.com/gp/product/B00AZ9T3OU

JorgeB · March 2, 2017

Agree, looks like a controller issue, if it's failing or there's some other issue can't say, replacing it should solve your issues but try to avoid marvell based controllers, as these are know to have issues with unRAID for some users.

pickthenimp · March 2, 2017

1 minute ago, johnnie.black said:

Agree, looks like a controller issue, if it's failing or there's some other issue can't say, replacing it should solve your issues but try to avoid marvell based controllers, as these are know to have issues with unRAID for some users.

Thanks for the reply. Do you have a better controller you recommend?

JorgeB · March 2, 2017

Asmedia for 2 ports, LSI for 4 or more.

tgggd86 · March 2, 2017

Hopefully you haven't run into the same problem I did as explained here:

Even after buying a new SATA controller card my problems still occurred. I had to upgrade my entire MOBO/CPU/RAM to resolve the issue. Of course no issues using that hardware outside of unraid

pickthenimp · March 4, 2017

I ended up getting that Marvell sata card anyway since I just need a quick and dirty fix while I wait on a new build I am purchasing.

I rebuilt drive 4 successfully per @trurl recommendation. Running a parity sync now (without correction) with zero errors...

Thanks everyone for the help.

Unraid crashed during parity check - disk failed?

Recommended Posts

pickthenimp

Link to comment

trurl

Link to comment

pickthenimp

Link to comment

pickthenimp

Link to comment

pickthenimp

Link to comment

SSD

Link to comment

pickthenimp

Link to comment

trurl

Link to comment

pickthenimp

Link to comment

trurl

Link to comment

pickthenimp

Link to comment

SSD

Link to comment

pickthenimp

Link to comment

JorgeB

Link to comment

pickthenimp

Link to comment

JorgeB

Link to comment

tgggd86

Link to comment

pickthenimp

Link to comment

Join the conversation