Second disk fails during rebuild of first disk

orons · March 21, 2016

Hi,

Upgraded to V6 on Friday, after upgrade ran a parity check, during the check one disk failed.

Bought a new disk to replace it, ran pre-clear and started the array, all was going fine but an hour ago got non stop read error from a different disk, the rebuild is still running.

What should i do? stop the rebuild and power cycle the second fail disk (maybe a able issue...) ? wait until the rebuild complete and than do the power cycle?

Attached syslog (end of it, it big) and smart log for the second fail disk.

Thanks,

Oron

syslog_-_Copy.txt

ST31500341AS_9VS0LHCC-20160321-2339.txt

JorgeB · March 21, 2016

Disk dropped offline, better to abort rebuild, powedown, check cables, power up and post new SMART.

orons · March 21, 2016

Done that, attached smart.

Tried to do a short smart test, failed

Should i try to rebuild again ?

ST31500341AS_9VS0LHCC-20160322-0033.txt

JorgeB · March 21, 2016

Not good, disk has pending sectors, if you have the space in the array or another pc you can unassign the disk that was rebuilding and try to copy everything you can from both the emulated disk and the 2nd failing disk, with some luck you can recover most data.

orons · March 21, 2016

So i should start the array copy what i can from disk2 and disk3 ( the first and second failed disks).

After i finished copying what's next ?

JorgeB · March 21, 2016

That's what I'd do, unassign the disk that was rebuilding, select "no device" and start array, copy everything you can from both disks, starting with the most important stuff, success will depend on how bad the 2nd failed disk is, after that and if necessary you can also try to recover some data from the 1st failed disk.

orons · March 22, 2016

OK, finished backing up most of the data, so far i just lost one movie so i dodged a bullet there.

What's the best way forward?

Replace the fail disk -> rebuild -> replace failing second disk -> rebuild

Replace second failing disk (is it a possible option in my state?) -> rebuild -> replace failed disk -> rebuild

TNX

JorgeB · March 22, 2016

If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config:

-take a screenshot of current disk assignments

-replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not)

-go to tools and click new config

-reassign all disks, double check that parity disk is in the parity slot

-start array to begin parity sync

-new disk(s) will have to be formatted

Consider upgrading to v6.2-beta, dual parity support.

orons · March 22, 2016

If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config:

-take a screenshot of current disk assignments

-replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not)

-go to tools and click new config

-reassign all disks, double check that parity disk is in the parity slot

-start array to begin parity sync

-new disk(s) will have to be formatted

Consider upgrading to v6.2-beta, dual parity support.

Hi,

Just want to clear the steps.

1. Go to new config

2. Remove both failed disks

3. Add the new disk (only have one precleared, the second one will be the parity disk after i replace it with a new one i already bought)

4. Make sure the parity is the same disk

5. Start the array and do a new parity

6. Copy the data back to the array

Is that correct ?

JorgeB · March 22, 2016

Basically yes, but clicking new config will unassign all disks, you can physically remove the failed disks from the server before of after doing it.

Then you have to reassign all disks on the main page, taking special care with the parity slot because if you assign a data disk to that slot and start the array you'll lose data.

When all disks are assigned click start array and a parity sync will begin.

If there's still any doubt ask before doing something you're not sure.

orons · March 23, 2016

Hi,

The parity is building, but noticed a weird issue.

When looking into some of the folders in the share they are empty, but if i look into the disk specifically i can see the content of the folder.

When doing ls i'm getting permission denied for the /mnt/user/ path, but from the /mnt/disk2/ i can see the files.

Any ideas? should i panic again ?

Thanks,

Oron

JorgeB · March 23, 2016

If you can browse by disk then the data is there.

What is the new disk you tried to rebuild showing, data or unmountable?

orons · March 24, 2016

It finished the parity building, still missing some of the files from the shares perspective.

The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?)

Here is an example for what i'm seeing:

root@Tower_Oron:~# ls /mnt/user/Bluray/

/bin/ls: reading directory /mnt/user/Bluray/: Permission denied

root@Tower_Oron:~# ls /mnt/disk1/Bluray/

Gravity/ SkyFall/

Inside\ Out/ The\ Dark\ Knight\ 2008/

Iron\ Man\ 2008/ The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/

Kingsman\ The\ Secret\ Service/ The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/

Live\ Free\ Or\ Die\ Hard\ 2007/ The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/

Sicario/ The\ Man\ from\ U.N.C.L.E/

***Edit***

So found the problem is from the new disk:

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck?

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD]

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied

What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )?

itimpi · March 24, 2016

It finished the parity building, still missing some of the files from the shares perspective.

The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?)

Here is an example for what i'm seeing:

root@Tower_Oron:~# ls /mnt/user/Bluray/

/bin/ls: reading directory /mnt/user/Bluray/: Permission denied

root@Tower_Oron:~# ls /mnt/disk1/Bluray/

Gravity/ SkyFall/

Inside\ Out/ The\ Dark\ Knight\ 2008/

Iron\ Man\ 2008/ The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/

Kingsman\ The\ Secret\ Service/ The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/

Live\ Free\ Or\ Die\ Hard\ 2007/ The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/

Sicario/ The\ Man\ from\ U.N.C.L.E/

***Edit***

So found the problem is from the new disk:

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck?

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD]

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied

What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )?

I am not sure. Such issues normally get resolved without problems if you run reiserfsck against the drive.

If you stop the array and then restart it in maintenance mode you can click on the disk to get the option to run file system checks/repairs.

JorgeB · March 24, 2016

This is what I suspected, the half rebuild disk has understandably filesystem issues.

Since all data on that disk is backed up, there's no point in trying to fix it, you can just format it.

-stop array, click that disk, change filesystem to XFS

-start array, disk will appear as unmountable, double check it's the disk you want to format

-click format

XFS is the preferred filesystem for v6, you can leave like that, there's no problem having disks with different filesystems, but if you prefer Reiserfs, do the same thing again, but this time change fs to resiser and format disk again.

orons · March 24, 2016

Thanks!

I'm pre clearing a new parity disk (8TB it's takes forever :-[ ), will do that after the pre-clear finish (i can't stop the array with the preclear running ...) and update.

Oron

orons · March 25, 2016

It worked, thanks!

No i'm replacing the parity drive with the bigger one, is it safe to preclear the old one? or shall i wait for the parity to complete?

JorgeB · March 25, 2016

Should be ok.

With the new preclear beta plugin it's possible to start/stop array during a preclear.

JorgeB · March 25, 2016

Though you may want to wait in case there's a failure with the new parity sync and you can use the old one.

orons · March 25, 2016

Too late, missed your second message :-[

F*** the parity build with the new 8TB will take forever...

orons · March 26, 2016

I cannot catch a break here.

Woke up this morning and a third drive is failing, the parity is @12.5% and is running very slow.

Should i stop the parity and try to move data from the failing disk to the new disk ?

Attached print screen, syslog and smart.

syslog (to big of file)

Mar 26 10:19:19 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD

Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1

Mar 26 10:19:22 Tower_Oron emhttp: cmd: /usr/local/emhttp/plugins/Sickbeard/scripts/rc.Sickbeard stop

Mar 26 10:19:32 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD

Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: irq_stat 0x40000001

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: failed command: READ DMA EXT

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: cmd 25/00:40:ef:72:1a/00:05:76:00:00/e0 tag 18 dma 688128 in

Mar 26 10:23:44 Tower_Oron kernel: res 51/40:2f:f7:76:1a/00:01:76:00:00/e0 Emask 0x9 (media error)

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: status: { DRDY ERR }

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: error: { UNC }

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: configured for UDMA/133

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 Sense Key : 0x3 [current] [descriptor]

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 ASC=0x11 ASCQ=0x4

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 CDB: opcode=0x28 28 00 76 1a 72 ef 00 05 40 00

Mar 26 10:23:44 Tower_Oron kernel: blk_update_request: I/O error, dev sdf, sector 1981445879

Mar 26 10:23:44 Tower_Oron kernel: ata5: EH complete

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445816

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445824

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445832

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445840

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445848

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445856

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445864

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445872

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445880

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445888

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445896

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445904

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445912

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445920

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445928

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445936

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445944

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445952

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445960

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445968

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445976

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445984

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445992

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446000

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446008

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446016

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446024

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446032

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446040

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446048

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446056

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446064

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446072

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446080

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446088

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446096

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446104

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446112

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446120

WDC_WD20EARS-00J2GB0_WD-WCAYY0154839-20160326-1027.txt

JorgeB · March 26, 2016

That's some bad luck :'(

Disk1 is bad, lots of pending sectors.

-since you have enough space I would cancel the parity sync, stop array, unassign parity for now and start array.

-copy everything you can from disk1 to another disk.

-when done do a new config without disk1 (or use a spare in its place if you have one) and start new parity sync.

orons · March 27, 2016

Finished salvaging what oi can from the third HDD.

Now i need to do new config again, we the replacement HDD instead of the third HDD and the new parity ?

JorgeB · March 27, 2016

Use all disks except old disk1, after doing the new config and starting the array a new parity sync will begin.

orons · March 27, 2016

It will do a new parity from the start? i started a bad parity build with the new parity drive, i'm afraid it will do me some damage...

Second disk fails during rebuild of first disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived