Errors showing on disk2

October 6, 200916 yr

I'm getting errors on my disk2. In the /var/log/syslog they look like:

Oct 6 11:34:23 Tower kernel: md: disk2 read error

Oct 6 11:34:23 Tower kernel: handle_stripe read error: 172134200/1, count: 1

Oct 6 11:35:38 Tower kernel: hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }

Oct 6 11:35:38 Tower kernel: hdc: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=93585544, sector=93585544

Oct 6 11:35:38 Tower kernel: ide: failed opcode was: unknown

Oct 6 11:35:38 Tower kernel: end_request: I/O error, dev hdc, sector 93585544

So what does it take for UnRaid to mark it as a failed drive and take it offline? Is this something that should prompt me to replace the drive or should is there a check that I can run to see what's happening?

thanks,

dave

October 6, 200916 yr

Author

Looking at the SMART data I see:

Disk 2: *ERROR* - ATA_Error_Count has increased from (no data) to 112 since 2009-09-18

Disk 2: WARNING - ATA_Error_Count it is now 112 (warning threshold is 20)

Disk 2: *ERROR* - Reallocated_Sector_Ct has increased from 309 to 1261 since 2009-09-18

Disk 2: *ERROR* - Reallocated_Sector_Ct it is now 1261 (error threshold is 30)

Disk 2: *ERROR* - Current_Pending_Sector has increased from 7 to 182 since 2009-09-18

Disk 2: *ERROR* - Current_Pending_Sector it is now 182 (error threshold is 5)

Disk 2: *ERROR* - Offline_Uncorrectable has increased from 7 to 182 since 2009-09-18

Disk 2: *ERROR* - Offline_Uncorrectable it is now 182 (error threshold is 5)

looks like it is time for a new drive.

dave

October 6, 200916 yr

Looking at the SMART data I see:

Disk 2: *ERROR* - ATA_Error_Count has increased from (no data) to 112 since 2009-09-18

Disk 2: WARNING - ATA_Error_Count it is now 112 (warning threshold is 20)

Disk 2: *ERROR* - Reallocated_Sector_Ct has increased from 309 to 1261 since 2009-09-18

Disk 2: *ERROR* - Reallocated_Sector_Ct it is now 1261 (error threshold is 30)

Disk 2: *ERROR* - Current_Pending_Sector has increased from 7 to 182 since 2009-09-18

Disk 2: *ERROR* - Current_Pending_Sector it is now 182 (error threshold is 5)

Disk 2: *ERROR* - Offline_Uncorrectable has increased from 7 to 182 since 2009-09-18

Disk 2: *ERROR* - Offline_Uncorrectable it is now 182 (error threshold is 5)

looks like it is time for a new drive.

dave

When unRAID gets a "read" error it will re-create the missing contents (the block it could not read) from parity and the other data disks. It will then "write' the missing contents back to the disk where it could not be read.

If the SMART firmware on the disk drive marked a sector as un-readable it will also track it as pending re-allocation so when it it is subsequently written to, it will have new and correct data, in a re-allocated sector.

Look like your drive has been doing just that. Re-allocating as it goes. Yes... it appears as if you need a new disk. Time to forget about best sale price and get a replacement as soon as you can.

Joe L.

October 6, 200916 yr

Author

Thanks Joe,

Just ordered a new 1TB drive to replace my 120GB one. Now I get to do the swap parity and replace drive in one shot thingy to support the larger parity drive.

Hopefully going from IDE to SATA won't be a big deal with the original Intel MB.

dave

October 6, 200916 yr

Thanks Joe,

Just ordered a new 1TB drive to replace my 120GB one. Now I get to do the swap parity and replace drive in one shot thingy to support the larger parity drive.

Hopefully going from IDE to SATA won't be a big deal with the original Intel MB.

dave

I did the same thing a while back. The two SATA ports on the original Intel MB work just fine.

Since the process of swapping the parity disks carries some risks, I'd make copies of any critical files on the failing drive elsewhere just to be sure you don't lose them. One unRAID server owner found the NEW parity drive they installed failed part way through the process... after old parity disk in the data slot was partially re-written.

In effect, they then had a two disk failure... and lost some data.

Just make certain the cables on the new SATA drive are not loose, or stressed when you install it. If you are a religious person, say an extra prayer or two that the disks live through the process, and that the power stays on throughout the copy and rebuild of your data. (A UPS will help)

Joe L.

October 6, 200916 yr

I'm the person that Joe L was referring to about the parity swap disaster. I would back up the failing data drive before the parity swap. Better to be safe..

I linked to my post below that talks about the failure..

http://lime-technology.com/forum/index.php?topic=4368.0

I hope everything turns out good for you.

October 6, 200916 yr

Author

Does using the preclear script of the new parity drive help or not?

I know that I don't want to clear the existing parity drive as it is needed to rebuild the array.

dave

October 6, 200916 yr

Does using the preclear script of the new parity drive help or not?

I know that I don't want to clear the existing parity drive as it is needed to rebuild the array.

dave

You need to weigh the time it takes to pre-clear vs. the additional time the array stays un-protected. The pre-clear will not make the process go faster, but it will give the drive an initial work-out to prove it will last through the first 12 or so hours of use. Working for 12 hours is not a guaranty it will work for the next 12 while you do the parity swap process.

So... it is up to you... If you can re-rip your media, or have other backup copies, I'd exercise the new disk first using pre-clear.

Joe L.

October 6, 200916 yr

Author

Ok, I don't have a log of data on that drive so I can just copy it to another drive prior to swapping everything out.

thanks

dave

Hopefully Newegg will ship fast

October 9, 200916 yr

Author

OK trouble right here in River City!

I shutdown the array, put the new hdd in and restarted

It came backup and the array went online after running parity.

I then noticed that the new SATA drive was not seen as I did not have SATA enabled in the BIOS so I restarted and enabled SATA.

I could then see the drive in the BIOS

I then restarted the array and did not see the new drive in the devices page?

So I stopped the array and now it shows drives 3, 4, and 6 are unformated.

I then look in the syslog and see that it did in fact fine the new drive per:

Oct 8 17:42:39 Tower kernel: scsi 1:0:0:0: Direct-Access ATA SAMSUNG HD103SJ 1AJ1 PQ: 0 ANSI: 5

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Write Protect is off

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Write Protect is off

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Oct 8 17:42:39 Tower kernel: sdb: unknown partition table

Oct 8 17:42:39 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk

I also see that it found disks 3,4 and 6 in the syslog:

Oct 8 17:42:39 Tower kernel: hde: max request size: 512KiB

Oct 8 17:42:39 Tower kernel: hde: 625142448 sectors (320072 MB) w/8192KiB Cache, CHS=38913/255/63

Oct 8 17:42:39 Tower kernel: hde: cache flushes supported

Oct 8 17:42:39 Tower kernel: hde: hde1

Oct 8 17:42:39 Tower kernel: hdg: max request size: 512KiB

Oct 8 17:42:39 Tower kernel: hdg: 625142448 sectors (320072 MB) w/8192KiB Cache, CHS=38913/255/63

Oct 8 17:42:39 Tower kernel: hdg: cache flushes supported

Oct 8 17:42:39 Tower kernel: hdg: hdg1

Oct 8 17:42:39 Tower kernel: hda: max request size: 512KiB

Oct 8 17:42:39 Tower kernel: hda: 625142448 sectors (320072 MB) w/8192KiB Cache, CHS=38913/255/63

But they now show in the Tower screen as "unformated" and I cannot start the array.

I also do not see the new drive sdb on the devices page. Not sure what to do now.

Ideas?

thanks,

dave

October 9, 200916 yr

Can you attach your full syslog? It should provide more information.

October 9, 200916 yr

Author

OK searching on the forum I found a post talking about unformated.

I had rtorrent still running, so when I killed that I was able to fully stop the array.

After that I was able to see the new drive on the devices page. So I'm bringing the array back online and will run the preclear script on the new drive prior to installing it.

Thanks for the reply, I was kinda in a panic

dave

October 9, 200916 yr

Good Luck...

ALWAYS BACKUP DATA! The more copies in different locations, the better. unRAID adds redundancy, but if your house/apt burns down/robbed...then what? If you run though this scenario in your head, its not hard to justify spending $100 on a 1TB drive to at least backup your pictures/other important documents.

I have a 1:1 copy of everything off-site and the important drive is backed up nightly and has a off-site copy. Off-site is sync'd each month.

October 9, 200916 yr

Author

All my important data is backed up offsite every 10 days. So I can get it all back, just don't want the hassle

dvae

October 10, 200916 yr

Author

Ran preclear on the new drive, put it in and pulled out the bad drive.

Upon startup I went to the devices page and moved the parity drive to disk2, and put the new drive in as parity.

I then clicked the 'yes' check box and said go.

It is now copying data to the parity drive. i assume this is the contents of the "old" parity drive, then should rebuild disk2. At least that is what I thought I read, though I can't find that webpage anymore.

Hopefully everything will come out OK. I still have the intact disk2 if needed.

As an aside. With all the read errors, can I return the drive to Seagate for a replacement? It is under warranty until the 18th of November.

Just not sure if read errors count, or do only write errors?

dave

October 10, 200916 yr

Author

OK, just in case someone else is needing it, here is the link to the swap-disable method.

http://lime-technology.com/wiki/index.php?title=UnRAID_Manual#Replace_a_single_disk_with_a_bigger_one

Sounds like I did this correctly, but it sure would be nice to have a little better feedback from unRaid as to what it is doing. Like a message saying it sees the parity drive in slot x, and a new larger drive as parity so it will be doing x, y, z.

As is you don't get any of that you just get the old parity disk marked as unformated and it starts writing to the parity disk while reading all other disks. Kinda leave you worried.

dave

October 10, 200916 yr

Author

OK looks like something did not go right.

I put the new drive in and configured it to be parity, and moved the original parity drive to be disk2, I removed the original disk2.

I then started the rebuild and it spent 5 hours copying stuff to the parity drive, but upon finishing the disk2 is still being displayed as unformated.

I thought that the array was supposed to rebuild disk2? Do I need to stop the array, format the drive and then have it rebuild?

from the unRaid Manual it makes it sound like this happens all automatically.

dave

October 10, 200916 yr

OK looks like something did not go right.

I put the new drive in and configured it to be parity, and moved the original parity drive to be disk2, I removed the original disk2.

I then started the rebuild and it spent 5 hours copying stuff to the parity drive, but upon finishing the disk2 is still being displayed as unformated.

I thought that the array was supposed to rebuild disk2? Do I need to stop the array, format the drive and then have it rebuild?

from the unRaid Manual it makes it sound like this happens all automatically.

dave

Post a syslog. You should not have to format anything... do not reboot, do not stop the array (at least not yet)

What version of unRAID are you running...

Joe L.

October 10, 200916 yr

Author

Here is the syslog.

Version 4.4.2

thanks,

dave

October 10, 200916 yr

Hmmm... doesn't look right to me either. When you clicked that "Yes" box, are you positive it was for the Start button and not the Restore button? Because it actually looks like a Restore was done first, which would setup a new array and throw out the old parity, and then the array was started and it built a new parity disk from the other 4 disks. That would leave Disk 2 looking like it was Unformatted. I do not see any messages about copying parity info from Disk 2, and there are no messages about rebuilding Disk 2. It really looks like a Restore was done, followed by a Start of a new array, and then a complete parity build was done, which finished successfully. It seems to think the array is fine then, except of course it thinks that Disk 2 needs formatting! It keeps trying to mount Disk 2 (the old Parity drive) as if it thinks it should have a Reiser file system on it. So far, I don't believe any data has been lost, just some lost time. Be very careful with the next steps ... give us some time to help advise correctly.

Just a tentative plan, but I think it may be best to reinstall the old Disk 2, unassign the new SATA drive, reassign the old parity drive and the old Disk 2, then do the Trust My Array procedure, which should recover the original array. The subsequent parity check will probably find some early parity errors. You may not want to finish that parity check because of the problems on the old Disk 2. Then we can consider trying again with the Swap-disable procedure. I would suggest pausing for a little though, perhaps someone else will see something I missed, or have better advice.

It does scare me a little that you are trying this procedure using BubbaRAID. I mean no disrespect to his work at all, but I suspect no one has *ever* tried this procedure with that unRAID variant.

You have one other problem, the IDE cable to hdd (Seagate 250GB) is faulty, produced errors. It was able to lower the transmission speed, and continue without further issue. If you have a better IDE cable handy, I would replace it.

October 10, 200916 yr

Author

Thanks for the input.

I do have a different IDE cable, I'll swap that out when I get back in there.

I believe you are correct, I think I hit Restore (that is the one that becomes available after clicking the "i know what i'm doing button"

So it sounds like I may want to restart the entire process over, or how about if I format the drive and just copy the contents of disk2 over after the format? Its about 100G of data and that might be faster than starting over?

thanks again

October 10, 200916 yr

Author

OK, I'm back to where I started this AM, except my new drive will not be zero'd out.

I'm running a parity check right now and can see/access all my data on all my drives including the one I"m trying to replace.

When the parity check completes I will try the swap-disable again, except I do see that it says to "start" the array, not restore it. That was my mistake.

Just a clarifying question.

When it says to remove the bad drive and replace the parity drive. I assume this means change them on the devices page, correct? I don't have to physically remove the disk2 (the one I'm replaceing) prior to starting the array as long as it is no longer assigned on the devices page, correct?

thank for all the help,

dave

October 10, 200916 yr

When it says to remove the bad drive and replace the parity drive. I assume this means change them on the devices page, correct? I don't have to physically remove the disk2 (the one I'm replaceing) prior to starting the array as long as it is no longer assigned on the devices page, correct?

I'm *almost* positive that is correct, it is the assignments that are the most important thing. I believe (and this is before my unRAID time) that originally it was the physical slots that were important, what physical drives were installed to which slots, but that was prior to the addition of drive assignments, added in v3 I think. Now, I think it is safe to leave it installed, so long as it is not assigned anywhere.

So it sounds like I may want to restart the entire process over, or how about if I format the drive and just copy the contents of disk2 over after the format? Its about 100G of data and that might be faster than starting over?

If you have another place to move that 100GB of data, then that makes everything safer, and opens up more and safer options. If you no longer had to worry about preserving the contents of Disk 2, then you can assign the parity drive to Disk 2, and the new SATA drive to the parity slot, press the Restore button (this time would be correct), format Disk 2, and build parity. You would not even have to worry about any step of the process failing, because the risk is gone.

October 11, 200916 yr

Author

I've come to the conclusion that the swap-disable just does not work in my case. I wonder if you truly do have to replace and physically move replace the drives to get this to work.

In my case I was removing an IDE drive and replacing it with a SATA drive and no matter what I did, I could never get the "start" button to be enabled. In the end I had to 'restore' the array, format the old parity drive and create new parity.

I find it frustrating when things don't work as advertised, but besides the loss of about 24 hours mucking with it, I now have my array back on line and working.

dave

October 11, 200916 yr

I've come to the conclusion that the swap-disable just does not work in my case. I wonder if you truly do have to replace and physically move replace the drives to get this to work.

In my case I was removing an IDE drive and replacing it with a SATA drive and no matter what I did, I could never get the "start" button to be enabled. In the end I had to 'restore' the array, format the old parity drive and create new parity.

I find it frustrating when things don't work as advertised, but besides the loss of about 24 hours mucking with it, I now have my array back on line and working.

dave

No, I suspect the issue is that you pressed "Restore" which renamed your old super.dat to super.old and forced your server to forget it ever had any other disks but the ones in it and assigned and working when you pressed the button. You asked it to immediately throw away parity and reset the drive configuration. I know that was not what you wanted to do, but it was what you did.

You might have been able to get back to your original super.dat superblock file by stopping the array, re-naming confg/super.old to config/super.dat and then rebooting, but its way too late now for that.

We've been warning people as best we could to NEVER press the "restore" button unless permanently removing a drive from the array and not replacing it with another. There are a handful of situations where there is an exception to that rule, but they take a special sequence of steps including command lines entered to not immediately invalidate parity. They can only work when all drives are working and have good data, the same data as when parity was last calculated.... never when a disk is being replaced or has failed.

There was never a need, and never is a need to physically remove a drive, unless you need to free the physical space for its replacement.

Joe L.

Errors showing on disk2

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)