Jump to content

Pairty-check taking days, maybe weeks


Recommended Posts

Posted

Last week, I accidentally killed the power to my unraid server, actually this happened twice.This has never been an issue in the past, it would show errors so I would run a parity-check and less than a day later, all was good.  However, this time it has taken well over 2 days and it only at 18% with showing nearly 30,000 minutes to go. What would cause this to happen? I am running version 4.7. I will attach my syslog in a reply to this thread.

Posted

I'm no expert but just take a line from the log that contains the word "error" and google that line. U should get results and can read how others solve similar issue. Doesn't look good BUT I am new unRAID so not familiar with parity checks and what not

 

 

Tapatalk is tha shizzle

Posted

I have the same errors as you...

 

I spent considerable time rebuilding the array and preclearing the parity drive.

 

There were no errors before I re-installed my parity drive (which was precleared successfully).

 

I am on Beta - 13 and since I put my parity drive in I get this:

 

Nov 29 20:29:48 HTPCRAID kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x14 (ATA bus error)

Nov 29 20:29:48 HTPCRAID kernel: ata3.01: status: { DRDY }

Nov 29 20:29:48 HTPCRAID kernel: ata3.00: hard resetting link

Nov 29 20:29:49 HTPCRAID kernel: ata3.01: hard resetting link

Nov 29 20:29:50 HTPCRAID kernel: ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Nov 29 20:29:50 HTPCRAID kernel: ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Nov 29 20:29:50 HTPCRAID kernel: ata3.00: configured for UDMA/133

Nov 29 20:29:50 HTPCRAID kernel: ata3.01: configured for UDMA/100

Nov 29 20:29:50 HTPCRAID kernel: ata3.01: device reported invalid CHS sector 0

Nov 29 20:29:50 HTPCRAID kernel: ata3: EH complete

Nov 29 20:30:22 HTPCRAID kernel: ata3: lost interrupt (Status 0x50)

Nov 29 20:30:22 HTPCRAID kernel: ata3.01: exception Emask 0x10 SAct 0x0 SErr 0x48d0002 action 0x0 frozen

Nov 29 20:30:22 HTPCRAID kernel: ata3.01: SError: { RecovComm PHYRdyChg CommWake 10B8B LinkSeq DevExch }

Nov 29 20:30:22 HTPCRAID kernel: ata3.01: failed command: READ DMA EXT

Nov 29 20:30:22 HTPCRAID kernel: ata3.01: cmd 25/00:00:1f:8d:06/00:04:00:00:00/f0 tag 0 dma 524288 in

Nov 29 20:30:22 HTPCRAID kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x14 (ATA bus error)

Nov 29 20:30:22 HTPCRAID kernel: ata3.01: status: { DRDY }

Nov 29 20:30:22 HTPCRAID kernel: ata3.00: hard resetting link

Nov 29 20:30:23 HTPCRAID kernel: ata3.01: hard resetting link

Nov 29 20:30:24 HTPCRAID kernel: ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Nov 29 20:30:24 HTPCRAID kernel: ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Nov 29 20:30:24 HTPCRAID kernel: ata3.00: configured for UDMA/133

Nov 29 20:30:24 HTPCRAID kernel: ata3.01: configured for UDMA/100

Nov 29 20:30:24 HTPCRAID kernel: ata3.01: device reported invalid CHS sector 0

Nov 29 20:30:24 HTPCRAID kernel: ata3: EH complete

 

 

How can I tell which drive is ata3? Is it the same as disk3? (that would not make sense - I had no errors on any of the drives until I put my parity drive back in.)

 

I have been playing with this for the past 8 days - I am getting really really sick of this.

Posted

I am seeing this in the log though, which keeps basically repeating:

 

Nov 29 03:52:30 PhenixHomeServ kernel: md: recovery thread woken up ...

Nov 29 03:52:30 PhenixHomeServ kernel: md: recovery thread checking parity...

Nov 29 03:52:30 PhenixHomeServ kernel: md: using 1152k window, over a total of 1953514552 blocks.

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x4810000 action 0xe frozen

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: irq_stat 0x08400040, interface fatal error, connection status changed

Nov 29 03:52:41 PhenixHomeServ kernel: ata2: SError: { PHYRdyChg LinkSeq DevExch }

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: failed command: READ DMA EXT

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: cmd 25/00:00:c7:16:00/00:04:00:00:00/e0 tag 0 dma 524288 in

Nov 29 03:52:41 PhenixHomeServ kernel:          res 50/00:00:c6:16:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: status: { DRDY }

Nov 29 03:52:41 PhenixHomeServ kernel: ata2: hard resetting link

 

I am running seatools right now on my drive and the short test past, so I am running the long test now. I am running in via USB in Windows so I couldn't run the SMART report on it but with another tool, it showed SMART was fine. Some errors but nothing critical. I am hoping at least the long test finds something and I can just RMA it to Seagate. Still under warranty until 2013. Or maybe I had a loose cable inside the box causes this.

 

Really am not sure what these errors in the log are meaning, and I am only assuming ata2 is my parity drive since all the other drives seem to read/write without a problem.

Posted

I am seeing this in the log though, which keeps basically repeating:

 

Nov 29 03:52:30 PhenixHomeServ kernel: md: recovery thread woken up ...

Nov 29 03:52:30 PhenixHomeServ kernel: md: recovery thread checking parity...

Nov 29 03:52:30 PhenixHomeServ kernel: md: using 1152k window, over a total of 1953514552 blocks.

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x4810000 action 0xe frozen

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: irq_stat 0x08400040, interface fatal error, connection status changed

Nov 29 03:52:41 PhenixHomeServ kernel: ata2: SError: { PHYRdyChg LinkSeq DevExch }

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: failed command: READ DMA EXT

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: cmd 25/00:00:c7:16:00/00:04:00:00:00/e0 tag 0 dma 524288 in

Nov 29 03:52:41 PhenixHomeServ kernel:          res 50/00:00:c6:16:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)

Nov 29 03:52:41 PhenixHomeServ kernel: ata2.00: status: { DRDY }

Nov 29 03:52:41 PhenixHomeServ kernel: ata2: hard resetting link

 

I am running seatools right now on my drive and the short test past, so I am running the long test now. I am running in via USB in Windows so I couldn't run the SMART report on it but with another tool, it showed SMART was fine. Some errors but nothing critical. I am hoping at least the long test finds something and I can just RMA it to Seagate. Still under warranty until 2013. Or maybe I had a loose cable inside the box causes this.

 

Really am not sure what these errors in the log are meaning, and I am only assuming ata2 is my parity drive since all the other drives seem to read/write without a problem.

A "long" test is an internal test performed by the disk where it attempts to read all the sectors.

 

The errors you were getting were with communications from the disk controller to the disk drive.  It could indicate a bad disk drive, a loose or intermittant cable, (power OR data)

or a power supply unable to keep up with the demands of the server. (and a disk/disk-controller sensitive to the poor voltage regulation)

 

It could also be cause by a bad drive tray, or backplane.  It is unlikely a "long" test will find anything.  (Oh yes, be sure to disable any spin-down on the disk, as it will abort the "long" test and prevent it from completing.)

 

Joe L.

Posted

Looks like my drive is bad. I tried about 10 times running the long test in Windows and it should show 16, and then a bunch of LBA errors and wouldn't complete. No error codes or a way to try to repair, as Seagate's website says you can.

 

So I tried it through DOS and again, 16 errors and an actual error code that doesn't show up on their site, or even with Google actually. zero pages come up. This time I could try to repair, which I did. I just got home and popped the drive back in, I was doing the Seatools while at work, and same issue. I'm going to try another long test when I get into work tonight but I think this is a bad drive so I am going to RMA it.

 

But just to be sure, how can I verify which drive is on ata2?

Posted

Looks like my drive is bad. I tried about 10 times running the long test in Windows and it should show 16, and then a bunch of LBA errors and wouldn't complete. No error codes or a way to try to repair, as Seagate's website says you can.

 

So I tried it through DOS and again, 16 errors and an actual error code that doesn't show up on their site, or even with Google actually. zero pages come up. This time I could try to repair, which I did. I just got home and popped the drive back in, I was doing the Seatools while at work, and same issue. I'm going to try another long test when I get into work tonight but I think this is a bad drive so I am going to RMA it.

 

But just to be sure, how can I verify which drive is on ata2?

does unRAID have a way to get to a command line with "root" capabilities? If so then try this command

dmesg | grep ata2

It should spit out something like mine below (NOTE: mine says ata4 because my ata2 has a dvd-writer and wouldn't be helpful)

[    0.851022] ata4: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xfc08 irq 15

[    1.581078] ata4.01: ATA-6: ST340810A, 3.39, max UDMA/100

[    1.581146] ata4.01: 78165360 sectors, multi 16: LBA

[    1.596459] ata4.01: configured for UDMA/100

So you can clearly see the model number of the hard drive accessing data over the ata4 controller BUT not sure how you'd do this IF you had all the same model number hard drives. hmmmmmmmm

Posted

Received my new drive from Seagate today and noticed it's a remanufactured drive. Not ideal what whatever. Was having a a problem with my system seeing the new drive and could hear click, click, click as like the drive keep trying to spin up and resetting. Noticed that the log while booting shows that drive hardlink resetting, or something like that. So resetting the SATA cable has brought the drive online so now unRAID sees it. So I am thinking maybe the issue really was just the SATA cable and even though I was seeing errors when runing SeaTools, the drive may be okay. SeaTools did show it repaired the errors, so I will see what happens. I am going to see if I can preclear it, than do a parity sync, and run SeaTools. I guess in about a 24+ hours I should have the preclear done.

 

All looks promising and if the drive tests okay, well I am going to keep the drive SeaGate sent me and just pay the $100. $100 for a 2TB is a great price now that prices are 2-3 times what they were a few months ago. I am getting low on space anyway and figured I would need to add a new drive within the next month.

Posted

So after this past week of testing this more, I have find the cause. The new remanufactured drive had the same exact problem with this hard link resetting error and taking forever to run the parity check. The problem? The molex to SATA power cable is bad! I need to run Seatools on both drives just to make sure there are no errors and if so to repair. I also am seeing DMA errors on another one of I drives that I tested that power cable on, so once I get a parity drive in, I plan on swapping out that drive and also running Seatools on it. I can;t believe that this is the cause of the problem. But I guess at least now I got another 2 TB drive for $110 ($100 for not returning and $10 shipping).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...