Jump to content

Second disk fails during rebuild of first disk


orons

Recommended Posts

Hi,

 

Upgraded to V6 on Friday, after upgrade ran a parity check, during the check one disk failed.

Bought a new disk to replace it, ran pre-clear and started the array, all was going fine but an hour ago got non stop read error from a different disk, the rebuild is still running.

 

What should i do? stop the rebuild and power cycle the second fail disk (maybe a able issue...) ? wait until the rebuild complete and than do the power cycle?

 

Attached syslog (end of it, it big) and smart log for the second fail disk.

 

Thanks,

Oron

syslog_-_Copy.txt

ST31500341AS_9VS0LHCC-20160321-2339.txt

Link to comment

That's what I'd do, unassign the disk that was rebuilding,  select "no device" and start array, copy everything you can from both disks, starting with the most important stuff, success will depend on how bad the 2nd failed disk is, after that and if necessary you can also try to recover some data from the 1st failed disk.

 

Link to comment

OK, finished backing up most of the data, so far i just lost one movie so i dodged a bullet there.

 

What's the best way forward?

 

Replace the fail disk -> rebuild -> replace failing second disk -> rebuild

Replace second failing disk (is it a possible option in my state?) -> rebuild -> replace failed disk -> rebuild

 

TNX

 

 

Link to comment

If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config:

 

-take a screenshot of current disk assignments

-replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not)

-go to tools and click new config

-reassign all disks, double check that parity disk is in the parity slot

-start array to begin parity sync

-new disk(s) will have to be formatted

 

Consider upgrading to v6.2-beta, dual parity support.

 

 

Link to comment

If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config:

 

-take a screenshot of current disk assignments

-replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not)

-go to tools and click new config

-reassign all disks, double check that parity disk is in the parity slot

-start array to begin parity sync

-new disk(s) will have to be formatted

 

Consider upgrading to v6.2-beta, dual parity support.

 

Hi,

 

Just want to clear the steps.

1. Go to new config

2. Remove both failed disks

3. Add the new disk (only have one precleared, the second one will be the parity disk after i replace it with a new one i already bought)

4. Make sure the parity is the same disk

5. Start the array and do a new parity

6. Copy the data back to the array

 

Is that correct ?

 

Link to comment

Basically yes, but clicking new config will unassign all disks, you can physically remove the failed disks from the server before of after doing it.

 

Then you have to reassign all disks on the main page, taking special care with the parity slot because if you assign a data disk to that slot and start the array you'll lose data.

 

When all disks are assigned click start array and a parity sync will begin.

 

If there's still any doubt ask before doing something you're not sure.

 

 

 

Link to comment

Hi,

 

The parity is building, but noticed a weird issue.

When looking into some of the folders in the share they are empty, but if i look into the disk specifically i can see the content of the folder.

When doing ls i'm getting permission denied for the /mnt/user/ path, but from the /mnt/disk2/ i can see the files.

 

Any ideas? should i panic again ?

 

Thanks,

Oron

Link to comment

It finished the parity building, still missing some of the files from the shares perspective.

The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?)

 

Here is an example for what i'm seeing:

 

root@Tower_Oron:~# ls /mnt/user/Bluray/

/bin/ls: reading directory /mnt/user/Bluray/: Permission denied

 

root@Tower_Oron:~# ls /mnt/disk1/Bluray/

Gravity/                          SkyFall/

Inside\ Out/                      The\ Dark\ Knight\ 2008/

Iron\ Man\ 2008/                  The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/

Kingsman\ The\ Secret\ Service/  The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/

Live\ Free\ Or\ Die\ Hard\ 2007/  The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/

Sicario/                          The\ Man\ from\ U.N.C.L.E/

 

***Edit***

 

So found the problem is from the new disk:

 

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck?

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD]

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied

 

What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )?

Link to comment

It finished the parity building, still missing some of the files from the shares perspective.

The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?)

 

Here is an example for what i'm seeing:

 

root@Tower_Oron:~# ls /mnt/user/Bluray/

/bin/ls: reading directory /mnt/user/Bluray/: Permission denied

 

root@Tower_Oron:~# ls /mnt/disk1/Bluray/

Gravity/                          SkyFall/

Inside\ Out/                      The\ Dark\ Knight\ 2008/

Iron\ Man\ 2008/                  The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/

Kingsman\ The\ Secret\ Service/  The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/

Live\ Free\ Or\ Die\ Hard\ 2007/  The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/

Sicario/                          The\ Man\ from\ U.N.C.L.E/

 

***Edit***

 

So found the problem is from the new disk:

 

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck?

Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD]

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied

Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied

 

What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )?

I am not sure.    Such issues normally get resolved without problems if you run reiserfsck against the drive.

 

If you stop the array and then restart it in maintenance mode you can click on the disk to get the option to run file system checks/repairs.

Link to comment

This is what I suspected, the half rebuild disk has understandably filesystem issues.

 

Since all data on that disk is backed up, there's no point in trying to fix it, you can just format it.

 

-stop array, click that disk, change filesystem to XFS

-start array, disk will appear as unmountable, double check it's the disk you want to format

-click format

 

XFS is the preferred filesystem for v6, you can leave like that, there's no problem having disks with different filesystems, but if you prefer Reiserfs, do the same thing again, but this time change fs to resiser and format disk again.

Link to comment

I cannot catch a break here.

 

Woke up this morning and a third drive is failing, the parity is @12.5% and is running very slow.

Should i stop the parity and try to move data from the failing disk to the new disk ?

 

Attached print screen, syslog and smart.

 

syslog (to big of file)

 

Mar 26 10:19:19 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD

Mar 26 10:19:20 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1

Mar 26 10:19:20 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1

Mar 26 10:19:22 Tower_Oron emhttp: cmd: /usr/local/emhttp/plugins/Sickbeard/scripts/rc.Sickbeard stop

Mar 26 10:19:32 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD

Mar 26 10:19:34 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1

Mar 26 10:19:34 Tower_Oron sudo:    root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: irq_stat 0x40000001

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: failed command: READ DMA EXT

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: cmd 25/00:40:ef:72:1a/00:05:76:00:00/e0 tag 18 dma 688128 in

Mar 26 10:23:44 Tower_Oron kernel:        res 51/40:2f:f7:76:1a/00:01:76:00:00/e0 Emask 0x9 (media error)

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: status: { DRDY ERR }

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: error: { UNC }

Mar 26 10:23:44 Tower_Oron kernel: ata5.00: configured for UDMA/133

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 Sense Key : 0x3 [current] [descriptor]

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 ASC=0x11 ASCQ=0x4

Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 CDB: opcode=0x28 28 00 76 1a 72 ef 00 05 40 00

Mar 26 10:23:44 Tower_Oron kernel: blk_update_request: I/O error, dev sdf, sector 1981445879

Mar 26 10:23:44 Tower_Oron kernel: ata5: EH complete

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445816

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445824

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445832

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445840

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445848

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445856

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445864

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445872

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445880

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445888

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445896

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445904

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445912

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445920

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445928

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445936

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445944

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445952

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445960

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445968

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445976

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445984

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445992

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446000

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446008

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446016

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446024

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446032

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446040

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446048

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446056

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446064

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446072

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446080

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446088

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446096

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446104

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446112

Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446120

WDC_WD20EARS-00J2GB0_WD-WCAYY0154839-20160326-1027.txt

unraid3.png.4745095fc08bdbefb13cd1f33c2673aa.png

Link to comment

That's some bad luck  :'(

 

Disk1 is bad, lots of pending sectors.

 

-since you have enough space I would cancel the parity sync, stop array, unassign parity for now and start array.

-copy everything you can from disk1 to another disk.

-when done do a new config without disk1 (or use a spare in its place if you have one) and start new parity sync.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...