Multiple drive failures

September 10, 201411 yr

Author

Smart status from drive1 - other than its age, all looks good to me. Drive is good.

Quote

September 10, 201411 yr

I would assume as much to.

I would still do the smart -t long to be sure.

it could have fallen off the bus due to other reasons. So if it was being rebuild sector by sector, they were replaced with sectors that already were the same (unless drive 6 had a read corruption).

Chances are good that the data on drive 1 is still intact.

Quote

September 10, 201411 yr

Author

long test executed. Where do the results go?

10.1.1.2 login: root

Linux 3.9.11p-unRAID.

root@10:~# smartctl -a /dev/sde | todos > /boot/logs/sde.txt

root@10:~# ls -l /boot/logs/sde.txt

-rwxrwxrwx 1 root root 4451 2014-09-10 10:50 /boot/logs/sde.txt*

root@10:~# smartctl -t long /dev/sde

Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 255 minutes for test to complete.

Test will complete after Wed Sep 10 15:06:18 2014

Use smartctl -X to abort test.

root@10:~#

Quote

September 11, 201411 yr

The results are in the report.

Quote

September 11, 201411 yr

Author

Smart report disk1 after long check

smart_drive1_-_2.txt

Quote

September 11, 201411 yr

Smart report disk1 after long check

Looks almost perfect, nothing to worry about.

Quote

September 12, 201411 yr

Author

Thanks! Waiting for drives to come in at this point BUT still accepting opinions/suggestions.

Quote

September 12, 201411 yr

Author

replacement drive for drive6 is in. What is the process for DDRescue? is it done on the main system or should i move both drives to a separate system?

Quote

September 15, 201411 yr

Author

Bump - looking for some direction on where to go next.

Quote

September 15, 201411 yr

replacement drive for drive6 is in. What is the process for DDRescue? is it done on the main system or should i move both drives to a separate system?

It's up to you. Boot the server with a live CD or install on a different system. Google ddrescue for more info.

Quote

September 17, 201411 yr

Author

OK - ddrescue installed on unraid but I am unsure of the next steps - after google and reading - do i add the blank unformatted 3TB disk to the array or do i run ddrescue and copy drive to drive without it in the array? does the array have to be started (maintenance mode) for all of this or not so much? Also - can someone suggest the commands?

bad drive is sdl

new drive is sdd

Quote

September 17, 201411 yr

Do not add the new drive to the array.

Preclear it to make sure it's in good condition. yes it will take a while, but it's one way of insuring all sectors are good.

You can then dd rescue the old drive to the new drive and try to work on the data on the new drive.

Do not add the new drive to the array. Just copy it.

You'll have to do allot of reading on dd rescue to find what works for you.

You can start here.

http://lime-technology.com/forum/index.php?topic=16734.msg153098#msg153098

Quote

September 17, 201411 yr

Author

Thanks!

Quote

September 21, 201411 yr

Author

root@10:/boot# ddrescue -f -n /dev/sdl /dev/sdd logfile

Press Ctrl-C to interrupt

Initial status (read from logfile)

rescued: 0 B, errsize: 0 B, errors: 0

Current status

rescued: 3 TB, errsize: 0 B, current rate: 76414 kB/s

ipos: 3 TB, errors: 0, average rate: 112 MB/s

opos: 3 TB, time from last successful read: 0 s

Finished

Quote

September 25, 201411 yr

Author

So i did a DDrescue on both of the suspect drives, both without any errors. Do i just tell it to forget parity and add these drives to the array and then reapply parity?

root@10:/boot# ddrescue -f /dev/sde /dev/sdh

Press Ctrl-C to interrupt

rescued: 2 TB, errsize: 0 B, current rate: 53870 kB/s

ipos: 2 TB, errors: 0, average rate: 85150 kB/s

opos: 2 TB, time from last successful read: 0 s

Finished

Quote

September 25, 201411 yr

That's beyond me. I've never had luck with forgetting parity and/or 'trusting my parity' procedures.

Perhaps someone else can recommend the course of action. I think Brian is good at this one.

At the very least you know these drives itself are good.

it also strengthens my resolve to help create md5sums of whole drives for validation in these cases.

Quote

September 26, 201411 yr

Author

Did a config reset, did not assign parity, reassigned data disks to their appropriate homes and included the new disks that had the rescued data on, start array. did a few quick spot checks and it looks like all of the data is in place. stop array, cleared config again and now doing a pre-clear/data wipe on the 2 drives that had issues. 1 had to go back to WD and the other one will have duplicate data on it. Looks like i will be back up and running this weekend and all should be well.

Quote

September 26, 201411 yr

ddrescue must have re-tried allot, with all those pending sectors it showing that there were no errors.

Yet the speed is really low.

if you have hashes of your data I would suggest you validate with the hashes.

if not, now is a good time to build them.

rescued: 2 TB, errsize: 0 B, current rate: 53870 kB/s

ipos: 2 TB, errors: 0, average rate: 85150 kB/s

Quote

September 26, 201411 yr

Author

no data hash - its a media server so most of the data is replaceable but well worth my time to save it. would you hash the drive or the individual files? and how would you go about that?

Quote

September 26, 201411 yr

no data hash - its a media server so most of the data is replaceable but well worth my time to save it. would you hash the drive or the individual files? and how would you go about that?

I would hash the individual files with md5deep -r or md5sum and a find script.

other people like blake and/or sha256.

The issue here is,

"am I confident there is no corruption from so many pending sectors?"

There are other tools to verify mp3's, flacs and video. look around the board in the DATA CORRUPTION thread.

In the meantime, it will take a long time, but do the md5deep for each disk duplicating the file somewhere off the disk 'after' it completes.

That's the quickest way to do it now.

I am working on a tool to catalog the array and store checksums, but I'm still going through the development and design process.

Here's a good tool also.

bitrot - a utility for generating sha256 keys for integrity checks

http://lime-technology.com/forum/index.php?topic=35226.msg327803#msg327803

This is my thread related to the subject matter

RFC: MD5 checksum/Hash Software

http://lime-technology.com/forum/index.php?topic=34988.msg325400#msg325400

However a simple md5deep -r for now will get you going.

I plan to make some tools to take the md5sum file and import it into the database or import it into the extended attributes.

So it at starts with a seed file.

Quote

Multiple drive failures

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)