orons Posted March 21, 2016 Share Posted March 21, 2016 Hi, Upgraded to V6 on Friday, after upgrade ran a parity check, during the check one disk failed. Bought a new disk to replace it, ran pre-clear and started the array, all was going fine but an hour ago got non stop read error from a different disk, the rebuild is still running. What should i do? stop the rebuild and power cycle the second fail disk (maybe a able issue...) ? wait until the rebuild complete and than do the power cycle? Attached syslog (end of it, it big) and smart log for the second fail disk. Thanks, Oron syslog_-_Copy.txt ST31500341AS_9VS0LHCC-20160321-2339.txt Link to comment
JorgeB Posted March 21, 2016 Share Posted March 21, 2016 Disk dropped offline, better to abort rebuild, powedown, check cables, power up and post new SMART. Link to comment
orons Posted March 21, 2016 Author Share Posted March 21, 2016 Done that, attached smart. Tried to do a short smart test, failed Should i try to rebuild again ? ST31500341AS_9VS0LHCC-20160322-0033.txt Link to comment
JorgeB Posted March 21, 2016 Share Posted March 21, 2016 Not good, disk has pending sectors, if you have the space in the array or another pc you can unassign the disk that was rebuilding and try to copy everything you can from both the emulated disk and the 2nd failing disk, with some luck you can recover most data. Link to comment
orons Posted March 21, 2016 Author Share Posted March 21, 2016 So i should start the array copy what i can from disk2 and disk3 ( the first and second failed disks). After i finished copying what's next ? Link to comment
JorgeB Posted March 21, 2016 Share Posted March 21, 2016 That's what I'd do, unassign the disk that was rebuilding, select "no device" and start array, copy everything you can from both disks, starting with the most important stuff, success will depend on how bad the 2nd failed disk is, after that and if necessary you can also try to recover some data from the 1st failed disk. Link to comment
orons Posted March 22, 2016 Author Share Posted March 22, 2016 OK, finished backing up most of the data, so far i just lost one movie so i dodged a bullet there. What's the best way forward? Replace the fail disk -> rebuild -> replace failing second disk -> rebuild Replace second failing disk (is it a possible option in my state?) -> rebuild -> replace failed disk -> rebuild TNX Link to comment
JorgeB Posted March 22, 2016 Share Posted March 22, 2016 If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config: -take a screenshot of current disk assignments -replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not) -go to tools and click new config -reassign all disks, double check that parity disk is in the parity slot -start array to begin parity sync -new disk(s) will have to be formatted Consider upgrading to v6.2-beta, dual parity support. Link to comment
orons Posted March 22, 2016 Author Share Posted March 22, 2016 If the 2nd failed disk drops offline again during the rebuild it will just rebuild garbage, if you copied everything you wanted from both disks I would do a new config: -take a screenshot of current disk assignments -replace the 2nd failed disk with a new one (if you have a replacement, or remove it from the array if not) -go to tools and click new config -reassign all disks, double check that parity disk is in the parity slot -start array to begin parity sync -new disk(s) will have to be formatted Consider upgrading to v6.2-beta, dual parity support. Hi, Just want to clear the steps. 1. Go to new config 2. Remove both failed disks 3. Add the new disk (only have one precleared, the second one will be the parity disk after i replace it with a new one i already bought) 4. Make sure the parity is the same disk 5. Start the array and do a new parity 6. Copy the data back to the array Is that correct ? Link to comment
JorgeB Posted March 22, 2016 Share Posted March 22, 2016 Basically yes, but clicking new config will unassign all disks, you can physically remove the failed disks from the server before of after doing it. Then you have to reassign all disks on the main page, taking special care with the parity slot because if you assign a data disk to that slot and start the array you'll lose data. When all disks are assigned click start array and a parity sync will begin. If there's still any doubt ask before doing something you're not sure. Link to comment
orons Posted March 23, 2016 Author Share Posted March 23, 2016 Hi, The parity is building, but noticed a weird issue. When looking into some of the folders in the share they are empty, but if i look into the disk specifically i can see the content of the folder. When doing ls i'm getting permission denied for the /mnt/user/ path, but from the /mnt/disk2/ i can see the files. Any ideas? should i panic again ? Thanks, Oron Link to comment
JorgeB Posted March 23, 2016 Share Posted March 23, 2016 If you can browse by disk then the data is there. What is the new disk you tried to rebuild showing, data or unmountable? Link to comment
orons Posted March 24, 2016 Author Share Posted March 24, 2016 It finished the parity building, still missing some of the files from the shares perspective. The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?) Here is an example for what i'm seeing: root@Tower_Oron:~# ls /mnt/user/Bluray/ /bin/ls: reading directory /mnt/user/Bluray/: Permission denied root@Tower_Oron:~# ls /mnt/disk1/Bluray/ Gravity/ SkyFall/ Inside\ Out/ The\ Dark\ Knight\ 2008/ Iron\ Man\ 2008/ The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/ Kingsman\ The\ Secret\ Service/ The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/ Live\ Free\ Or\ Die\ Hard\ 2007/ The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/ Sicario/ The\ Man\ from\ U.N.C.L.E/ ***Edit*** So found the problem is from the new disk: Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck? Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD] Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )? Link to comment
itimpi Posted March 24, 2016 Share Posted March 24, 2016 It finished the parity building, still missing some of the files from the shares perspective. The new disk is showing file (i did not pre clear it again after the first time, maybe that cause the problem?) Here is an example for what i'm seeing: root@Tower_Oron:~# ls /mnt/user/Bluray/ /bin/ls: reading directory /mnt/user/Bluray/: Permission denied root@Tower_Oron:~# ls /mnt/disk1/Bluray/ Gravity/ SkyFall/ Inside\ Out/ The\ Dark\ Knight\ 2008/ Iron\ Man\ 2008/ The\ Hunger\ Games\ -\ Mockingjay\ -\ Part\ 2/ Kingsman\ The\ Secret\ Service/ The\ Lord\ of\ the\ Rings\ The\ Fellowship\ of\ the\ Ring\ 2001/ Live\ Free\ Or\ Die\ Hard\ 2007/ The\ Lord\ of\ the\ Rings\ The\ Two\ Towers\ 2002/ Sicario/ The\ Man\ from\ U.N.C.L.E/ ***Edit*** So found the problem is from the new disk: Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-5150 search_by_key: invalid format found in block 459024602. Fsck? Mar 24 08:33:10 Tower_Oron kernel: REISERFS error (device md6): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [5 8022 0x0 SD] Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: fstatat: Submarine 2011 (13) Permission denied Mar 24 08:33:10 Tower_Oron shfs/user: shfs_readdir: readdir_r: /mnt/disk6/Nas/Movies (13) Permission denied What's the best way to go forward ? try new permission tool ? New config and remove the disk and then re-add it (have all of its data backup )? I am not sure. Such issues normally get resolved without problems if you run reiserfsck against the drive. If you stop the array and then restart it in maintenance mode you can click on the disk to get the option to run file system checks/repairs. Link to comment
JorgeB Posted March 24, 2016 Share Posted March 24, 2016 This is what I suspected, the half rebuild disk has understandably filesystem issues. Since all data on that disk is backed up, there's no point in trying to fix it, you can just format it. -stop array, click that disk, change filesystem to XFS -start array, disk will appear as unmountable, double check it's the disk you want to format -click format XFS is the preferred filesystem for v6, you can leave like that, there's no problem having disks with different filesystems, but if you prefer Reiserfs, do the same thing again, but this time change fs to resiser and format disk again. Link to comment
orons Posted March 24, 2016 Author Share Posted March 24, 2016 Thanks! I'm pre clearing a new parity disk (8TB it's takes forever ), will do that after the pre-clear finish (i can't stop the array with the preclear running ...) and update. Oron Link to comment
orons Posted March 25, 2016 Author Share Posted March 25, 2016 It worked, thanks! No i'm replacing the parity drive with the bigger one, is it safe to preclear the old one? or shall i wait for the parity to complete? Link to comment
JorgeB Posted March 25, 2016 Share Posted March 25, 2016 Should be ok. With the new preclear beta plugin it's possible to start/stop array during a preclear. Link to comment
JorgeB Posted March 25, 2016 Share Posted March 25, 2016 Though you may want to wait in case there's a failure with the new parity sync and you can use the old one. Link to comment
orons Posted March 25, 2016 Author Share Posted March 25, 2016 Too late, missed your second message F*** the parity build with the new 8TB will take forever... Link to comment
orons Posted March 26, 2016 Author Share Posted March 26, 2016 I cannot catch a break here. Woke up this morning and a third drive is failing, the parity is @12.5% and is running very slow. Should i stop the parity and try to move data from the failing disk to the new disk ? Attached print screen, syslog and smart. syslog (to big of file) Mar 26 10:19:19 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1 Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1 Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1 Mar 26 10:19:20 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1 Mar 26 10:19:22 Tower_Oron emhttp: cmd: /usr/local/emhttp/plugins/Sickbeard/scripts/rc.Sickbeard stop Mar 26 10:19:32 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --git-dir=/mnt/cache/Sickbeard/.git rev-parse HEAD Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; python --version 2>&1 Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; sqlite3 --version 2>&1 Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; git --version 2>&1 Mar 26 10:19:34 Tower_Oron sudo: root : TTY=unknown ; PWD=/usr/local/emhttp ; USER=nobody ; COMMAND=/bin/bash -c . /mnt/cache/.PhAzE-Common/Sickbeard/startcfg.sh; curl --version 2>&1 Mar 26 10:23:44 Tower_Oron kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Mar 26 10:23:44 Tower_Oron kernel: ata5.00: irq_stat 0x40000001 Mar 26 10:23:44 Tower_Oron kernel: ata5.00: failed command: READ DMA EXT Mar 26 10:23:44 Tower_Oron kernel: ata5.00: cmd 25/00:40:ef:72:1a/00:05:76:00:00/e0 tag 18 dma 688128 in Mar 26 10:23:44 Tower_Oron kernel: res 51/40:2f:f7:76:1a/00:01:76:00:00/e0 Emask 0x9 (media error) Mar 26 10:23:44 Tower_Oron kernel: ata5.00: status: { DRDY ERR } Mar 26 10:23:44 Tower_Oron kernel: ata5.00: error: { UNC } Mar 26 10:23:44 Tower_Oron kernel: ata5.00: configured for UDMA/133 Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 Sense Key : 0x3 [current] [descriptor] Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 ASC=0x11 ASCQ=0x4 Mar 26 10:23:44 Tower_Oron kernel: sd 6:0:0:0: [sdf] tag#18 CDB: opcode=0x28 28 00 76 1a 72 ef 00 05 40 00 Mar 26 10:23:44 Tower_Oron kernel: blk_update_request: I/O error, dev sdf, sector 1981445879 Mar 26 10:23:44 Tower_Oron kernel: ata5: EH complete Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445816 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445824 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445832 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445840 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445848 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445856 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445864 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445872 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445880 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445888 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445896 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445904 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445912 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445920 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445928 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445936 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445944 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445952 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445960 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445968 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445976 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445984 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981445992 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446000 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446008 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446016 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446024 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446032 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446040 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446048 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446056 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446064 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446072 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446080 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446088 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446096 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446104 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446112 Mar 26 10:23:44 Tower_Oron kernel: md: disk1 read error, sector=1981446120 WDC_WD20EARS-00J2GB0_WD-WCAYY0154839-20160326-1027.txt Link to comment
JorgeB Posted March 26, 2016 Share Posted March 26, 2016 That's some bad luck :'( Disk1 is bad, lots of pending sectors. -since you have enough space I would cancel the parity sync, stop array, unassign parity for now and start array. -copy everything you can from disk1 to another disk. -when done do a new config without disk1 (or use a spare in its place if you have one) and start new parity sync. Link to comment
orons Posted March 27, 2016 Author Share Posted March 27, 2016 Finished salvaging what oi can from the third HDD. Now i need to do new config again, we the replacement HDD instead of the third HDD and the new parity ? Link to comment
JorgeB Posted March 27, 2016 Share Posted March 27, 2016 Use all disks except old disk1, after doing the new config and starting the array a new parity sync will begin. Link to comment
orons Posted March 27, 2016 Author Share Posted March 27, 2016 It will do a new parity from the start? i started a bad parity build with the new parity drive, i'm afraid it will do me some damage... Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.