Jump to content

Creating first Parity - Big Errors! - REVISED-new syslog


Recommended Posts

Hi All,

 I'm just got a pro key and have added a third drive and have ported the bulk of my data over.  So last night I added a 4th drive to use for the parity drive.  Things started fine, it's a new 2TB Hitachi drive.  unRAID predicted approx.654 min. with a speed of approx 58,000KB/sec, looked reasonable so I went to bed.  I checked in this morning, I hit refresh and it took about a min. for the new page to load and now was predicting something like 80,000 min. estimated finish and a speed of 240KB/s and the current position was only 37%.  After a few minutes, the speed was back up to what it was previously and it was predicting 350 min. or so.  I thought this might be a function of just waking something up.  But now as I'm writing this, I just checked again and it's back town to 24KB/s and 108,177 min. to completion.  I also just noticed an alarming 65,797 errors on one of the drives, a Samsung 1.5TB!  This is not looking normal.  What should I do?

 

I tried to post the syslog, but I couldn't get the message to post to the board - it's 11mb!  I tried zipping it and it's still over 400K.  So I broke it into 4 parts and I'm uploading the first and last part as I think the middle is just repetitive.

 

Oh yeah, damn.  Just checked again, and the errors on the Samsung are climbing, over 70,000 and I can't access it any more from my Mac.  I get "an unexpected error occurred. (error code -43).  Should I stop the system?

 

Thanks!

syslog-pt1.rtf.zip

syslog-pt4.rtf.zip

Link to comment

I had a similar thing happen when I started moving cables out of the way of the case fan while it was calculating parity - in my case one of the SATA cables had been jostled which caused the errors.  Judging from your syslog, your system is definitely having problems reading from that one drive.  I'd stop the system, confirm all cabling is tight then bring it up and review the log for errors.  Before kicking off another parity calculation, I'd suggest following the instructions here http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems and check the filesystem on each of the data drives.  Doesn't take too long and no point in doing the parity calculation if there's corruption.

Link to comment

The syslog is showing a Sense DMA error. Is the drive Sata or Pata becouse i have seen similar problems on Pata drives when 2 drives are set master or slave or cable select.. Make sure that the drives have one on as master and other as slave on same cable.

 

If these are sata and cables are pluged in correctly check bios for how the drives are being reported. It might be seen wrong.

Link to comment

Hey Jazzy, I checked all the cabling, went to your link, ran the S.M.A.R.T. check on all the discs as recommended on that page, came out fine, then ran "reiserfsck" on disk 1.  It took 2 hours for a 2TB drive, longer than I thought it would, but it came out clean.  Now I'm trying it on disk 2, the problem disk and we'll see what happens.

 

Link to comment

Ahh!  I think I know what the problem is.  I totally forgot all about it: as I was moving my data over to this drive I discovered 3 odd folders, they must be corrupted because I can't open them and I can't delete them from the mac.  Every time I try the mac freezes.  I'll bet that's what's giving the parity process the problem.  Can someone tell me how to delete them through the console or telnet?  THANKS!

Link to comment

Parity works on the individual bytes on the drive.  It has absolutely no concept of files, or data, or even formatting.

 

If you are getting errors when initially adding a parity drive, then you have hardware problems you must resolve.

 

Basically, according to your syslog, disk2 has un-readable sectors (uncorrectable media errors):

May 25 04:40:01 unRAID kernel: ata3.00: cmd 25/00:00:0f:d4:8b/00:04:51:00:00/e0 tag 0 dma 524288 in\

May 25 04:40:01 unRAID kernel:          res 51/40:00:0f:d4:8b/00:00:51:00:00/e0 Emask 0x9 (media error)\

May 25 04:40:01 unRAID kernel: ata3.00: status: \{ DRDY ERR \}\

May 25 04:40:01 unRAID kernel: ata3.00: error: \{ UNC \}\

May 25 04:40:01 unRAID kernel: ata3.00: configured for UDMA/133\

May 25 04:40:01 unRAID kernel: ata3: EH complete\

May 25 04:40:04 unRAID kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0\

May 25 04:40:04 unRAID kernel: ata3.00: irq_stat 0x40000001\

May 25 04:40:04 unRAID kernel: ata3.00: failed command: READ DMA EXT\

May 25 04:40:04 unRAID kernel: ata3.00: cmd 25/00:00:0f:d4:8b/00:04:51:00:00/e0 tag 0 dma 524288 in\

May 25 04:40:04 unRAID kernel:          res 51/40:00:0f:d4:8b/00:00:51:00:00/e0 Emask 0x9 (media error)\

May 25 04:40:04 unRAID kernel: ata3.00: status: \{ DRDY ERR \}\

May 25 04:40:04 unRAID kernel: ata3.00: error: \{ UNC \}\

May 25 04:40:04 unRAID kernel: ata3.00: configured for UDMA/133\

May 25 04:40:04 unRAID kernel: sd 3:0:0:0: [sdc] Unhandled sense code\

May 25 04:40:04 unRAID kernel: sd 3:0:0:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08\

May 25 04:40:04 unRAID kernel: sd 3:0:0:0: [sdc] Sense Key : 0x3 [current] [descriptor]\

May 25 04:40:04 unRAID kernel: Descriptor sense data with sense descriptors (in hex):\

May 25 04:40:04 unRAID kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 \

May 25 04:40:04 unRAID kernel:         51 8b d4 0f \

May 25 04:40:04 unRAID kernel: sd 3:0:0:0: [sdc] ASC=0x11 ASCQ=0x4\

May 25 04:40:04 unRAID kernel: sd 3:0:0:0: [sdc] CDB: cdb[0]=0x28: 28 00 51 8b d4 0f 00 04 00 00\

May 25 04:40:04 unRAID kernel: end_request: I/O error, dev sdc, sector 1368118287\

May 25 04:40:04 unRAID kernel: md: disk2 read error\

May 25 04:40:04 unRAID kernel: handle_stripe read error: 1368118224/2, count: 1\

May 25 04:40:04 unRAID kernel: md: disk2 read error\

May 25 04:40:04 unRAID kernel: handle_stripe read error: 1368118232/2, count: 1\

May 25 04:40:04 unRAID kernel: md: disk2 read error\

May 25 04:40:04 unRAID kernel: handle_stripe read error: 1368118240/2, count: 1\

May 25 04:40:04 unRAID kernel: md: disk2 read error\

May 25 04:40:04 unRAID kernel: handle_stripe read error: 1368118248/2, count: 1\

May 25 04:40:04 unRAID kernel: md: disk2 read error\

 

If you had already added a parity drive, you would have been able to recover from this, if you have not yet fully calculated parity on your set of data drives, and have no other copy of what was on the defective disk, then you are out-of-luck.  The data on the un-readable sectors is probably gone for good.

 

If, as you say, the file-system check came out clean, then perhaps the media errors are on a part of the disk with no files (but parity still needs to be able to read all the sectors)  In which case, stop the array, un-assign the parity disk so it does not try to read those sectors, copy the files off of disk2 onto other disks, and replace disk2 with a new disk.

 

You'll be able to see how bad disk2 is if you get a smart report on it

smartctl -d ata -a /dev/sdc

Post the output. I know you said you ran the report, but clearly, unless the sectors just failed, you did not know what to look for.  

You can have the disk perform a "long" test of itself, asking it to read all its own sectors looking for ones it cannot read, by disabling any spin-down timer and then typing

smartctl -d ata -t long /dev/sdc

(If unRAID spins down the disk, the test will abort, so you need to set the spin-down to never)

 

Then, wait 4 or 5  hours for the test to complete and get another status report using

smartctl -d ata -a /dev/sdc

 

Towards the bottom of the report will be the results of the "long" test.

You also want to look at the lines describing Re-Allocated sectors, or sectors Pending Re-Allocation.  The last column on the right on those parameters is the actual count of un-readable sectors.

 

Joe L.

Link to comment

OK Joe, so I started offloading the files from disk 2 and the computer froze again, so obviously, there's something wrong with this drive.  I'm not at home now, so I can't restart it, but I'll go thru your instructions tonight.  I have the smart reports saved at home, I can upload them then.

 

Once I get this drive as cleared off as possible and assuming there is a problem with the drive, is there a possibility of repairing the drive or is it junk?  Also, once I add a new drive, would it be wise to preclear it (I've been reading a little about that here and there)? 

 

And finally, is there an easy way to determine if the files I copy over to the unRAID from my Mac are uncorrupted?  I have read about comparing md5 checksums, and I downloaded HashTab for the mac, but it looks like I have to compare each file separately.  That would take months.  There must be a more automated approach that I'm not understanding.

 

THANKS for all the help!

 

Adam

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...