January 30, 201610 yr I have upgrade to 6 from 5.0.4 last week, clean install and did one parity check that went through no errors. So far so good. Before 2 days a 2tb Samsung get the Red ball and shows unmountable. I checked cables allow seem good. My parity drive is also-ran 2tb but since the server runs 4 year now with some old drives, more than 2 years was running as back up server meaning just a few hours into a wholesale week. I decided to start using bigger drives and didn't want to replace first the 2tb falling drive then go for bigger parity drive and went for swap parity procedure using a 5tb drive. Parity copy was successful and continued with data rebuild and here is when the same problem occurs. Drive is unmountable but the data rebuild continues. It this possible? Shall I stop now the procedure? If not what is the next steps? tower-diagnostics-20160130-1152.zip tower-syslog-20160130-1153.zip
January 30, 201610 yr Author https://www.dropbox.com/s/d7sgnpswj089iao/Screenshot_2016-01-30-11-31-44.png?dl=0 https://www.dropbox.com/s/cxn1bztpbmm8d7g/Screenshot_2016-01-30-11-37-33.png?dl=0
January 30, 201610 yr Community Expert Disk 2 is has file system corruption, the rebuilt disk is going to have same issue, let rebuild finish and then run reiserfsck. https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later
January 30, 201610 yr Author So the corruption is also on the parity? Because I have removed the disk that was first red ball and unmountable with the parity swap procedure. When it finish rebuild according to wiki I have to stop and then start the array again in maintenance mode and write reiserfsck --check /dev/md2 (since it's my disk 2?) for start and hope no further action will be required. I know from now that I won't be able to stop the array when rebuild is finished and my option will be the halt command for a shutdown or reboot from console mode. Is this OK? Also the 2 tb Samsung disk that I replaced is it possible that is still OK? I run also reserfsck on it or I make a preclear on it to verify is still good? And last because I have two licence when I did the upgrade from 5.0.4 to 6 I used my second usb drive so the other one is intact if that helps and can be used somehow in the situation I am at the moment. Thanks anyway for quick reply and they help you have provided....
January 30, 201610 yr Community Expert So the corruption is also on the parity? Because I have removed the disk that was first red ball and unmountable with the parity swap procedure. Corruption probably happen when old disk 2 redballed, parity will reflect that corruption, but the parity itself doesn't have a filesystem. When it finish rebuild according to wiki I have to stop and then start the array again in maintenance mode and write reiserfsck --check /dev/md2 (since it's my disk 2?) for start and hope no further action will be required. Correct. I know from now that I won't be able to stop the array when rebuild is finished and my option will be the halt command for a shutdown or reboot from console mode. Is this OK? Don't understand this question, wait for rebuild to finish, stop array and start it in maintenance mode and run reiserfsck. Also the 2 tb Samsung disk that I replaced is it possible that is still OK? I run also reserfsck on it or I make a preclear on it to verify is still good? It can be ok, and if reiserfsck is not successful on the rebuilt disk you can try it on the old disk, do not preclear it for now. And last because I have two licence when I did the upgrade from 5.0.4 to 6 I used my second usb drive so the other one is intact if that helps and can be used somehow in the situation I am at the moment. Thanks anyway for quick reply and they help you have provided.... Keep the flash backup for now but it shouldn't be needed for this situation.
January 30, 201610 yr Community Expert So the corruption is also on the parity? Because I have removed the disk that was first red ball and unmountable with the parity swap procedure. Parity works at the sector level and has no knowledge of the file system that is on any disk. Therefore if the 'emulated' disk has a file system corruption, the rebuilt version will have exactly the same corruption. When it finish rebuild according to wiki I have to stop and then start the array again in maintenance mode and write reiserfsck --check /dev/md2 (since it's my disk 2?) for start and hope no further action will be required. That is correct, although I expect in practise the reiserfsck command will report some corruption and give you the command required to fix it. I know from now that I won't be able to stop the array when rebuild is finished and my option will be the halt command for a shutdown or reboot from console mode. Is this OK? Why are you sure that you will not be able to closedown when the rebuild finishes? Do you have the powerdown plugin installed? If not I would suggest that you install it as if installed the 'powerdown' command from the command line will typically succeed even when the GUI is not accessible. You should avoid the 'halt' command as that will do an unclean shutdown so that on restart unRAID will think it needs to do a parity check. Also the 2 tb Samsung disk that I replaced is it possible that is still OK?A disk is marked as 'failed' in unRAID the moment a write to it fails whatever the reason. More often than not this is due to an external factor like a loose cable and the disk is fine. Looking at the SMART attributes will give a good idea whether the disk is likely to be OK or not. I run also reiserfsck If the reiserfack on the rebuilt disk fails then you may want to run it against the failed disk to recover data. on it or I make a preclear on it to verify is still good? This is the best way to check out the disk. Do not attempt it until you have had a successful reiserfsck against the rebuilt disk unless you are not worried about losing any data that might be on it. And last because I have two licence when I did the upgrade from 5.0.4 to 6 I used my second usb drive so the other one is intact if that helps and can be used somehow in the situation I am at the moment. Thanks anyway for quick reply and they help you have provided.... It should not make any difference which drive you use if they have the same configuration information on them.
January 30, 201610 yr Author The question you did not understand is that when rebuild is finished when I push array stop I will get a warning again about disk 2 can't be mount, unmount, something like this, so my option will be to reboot server from console mode root : reboot or halt, start server again then start the array in maintenance mode. 6 hours about more to go to see what will happen. https://www.dropbox.com/s/vlg10rh124mwkoq/Screenshot_2016-01-30-14-51-01.png?dl=0 I hope riserfs won't take that long.
January 30, 201610 yr Community Expert You should be able to stop the array when the rebuild finishes, but if auto start is enable you can disable it in case you have to force a reboot, so then you can start in maintenance mode. Settings > Disks settings > Enable auto start – Set to No You can change this now.
January 30, 201610 yr Community Expert The question you did not understand is that when rebuild is finished when I push array stop I will get a warning again about disk 2 can't be mount, unmount, something like this, so my option will be to reboot server from console mode root : reboot or halt, start server again then start the array in maintenance mode. If disk 2 is the one being rebuilt, then the fact that it was unmountable should have no effect on whether the array can be stopped. 6 hours about more to go to see what will happen. https://www.dropbox.com/s/vlg10rh124mwkoq/Screenshot_2016-01-30-14-51-01.png?dl=0 I hope riserfs won't take that long. My rule of thumb is that if the reiserfsck is running a repair (as opposed to a check) it takes something like 1 hour per TB.
January 30, 201610 yr Author You should be able to stop the array when the rebuild finishes, but if auto start is enable you can disable it in case you have to force a reboot, so then you can start in maintenance mode. Settings > Disks settings > Enable auto start – Set to No You can change this now. Good advice it is like this from default. When i did the clean upgrade I haven't modified any default configuration except my share, network info etc... and default filesystem in settings to riserfs. Will come back when rebuild is finished hopefully with good news. I have lost 2 drives already in those 4 years without loose any data. I hope to be the same case for this drive too
January 30, 201610 yr Author So rebuild is finished I push stop, yes I want to do this but doesn't seem to work look at pic https://www.dropbox.com/s/tjqs9mw18vvcb12/Screenshot_2016-01-30-19-29-29.png?dl=0 I attach also the diagnostics file after I pushed the stop array. I have screen, keyboard to the server. Shall I write reboot and go with riserfs check or do something else? tower-diagnostics-20160130-1930.zip
January 30, 201610 yr Community Expert Strange, an unmountable disk doesn't usually interfere with stopping the array. If you have or can still install the powerdown plugin type powerdown -r, if not, type reboot.
January 30, 201610 yr Author I have rebooted and I am afraid not good news. What are my options now? Rebuild? https://www.dropbox.com/s/rozuchske349gjm/P60130-211457.jpg?dl=0 https://www.dropbox.com/s/ff87f8ee2ccar09/P60130-200548.jpg?dl=0
January 30, 201610 yr Community Expert I have rebooted and I am afraid not good news. What are my options now? Rebuild? https://www.dropbox.com/s/rozuchske349gjm/P60130-211457.jpg?dl=0 https://www.dropbox.com/s/ff87f8ee2ccar09/P60130-200548.jpg?dl=0 That confirms that there was corruption (which is not atypical after a failed write). As was suggested run reiserfsck with the --rebuild-tree option to correct it.
January 30, 201610 yr Author Done waiting for results. After it finishes I stop array, unckeck maintenance mode, start array normal again and see if it has saved anything? I know what is on disk 2 because I write direct to disks and read from shares. Have done parity check last week was OK and since then no writes to that disk. I have only deleted some files, so it seems strange so much corruption...
January 30, 201610 yr Community Expert Yes, reiserfsck is usually very good at recovering lost data, hopefully most of it will be intact and in its place, but there's probably also going to be a lost+found folder with some data.
January 31, 201610 yr Author So good news after all. The disk 2 had only videos files. After the rebuild all my files where there and only 5 video files where inside lost and found with weird name. I renamed them 1.mkv,2 etc.. And they are playable, so was easy to identify the content. There also 2 empty folders inside lost and found that cannot be delete it. One more note when rebuild finished I couldn't make a clean stop array shutdown so I closed server open it again stopped the parity check and checked that I can start and stop array normally. Since this went through I have started again the parity check. What about those two folders that cannot be deleted? When I try to delete them the array after some seconds will stop but I am able to start it again. Is there a way to check for corruption data integrity for the rest disks or I might have some silent corruption and I end after some time with 3-4 unmountable disks? https://www.dropbox.com/s/hprwzuh1ug8f3wb/Screenshot_2016-01-31-17-45-27.png?dl=0 Riserfs is rock solid when I add new disks I see that is advised to use xfs. In a similar case xfs filesystem will recover from a disk rebuild? Btrfs has brot (for data intergrity) thing but unraid will identify the problem but won't be able to correct it because every disk has its own filesystem. tower-diagnostics-20160131-1746.zip
January 31, 201610 yr Community Expert What about those two folders that cannot be deleted? When I try to delete them the array after some seconds will stop but I am able to start it again. That is a bit strange - the reiserfsck should not leave you with a file or folder that cannot be deleted. How are you trying to delete them? If doing it over the network then you should run the 'newperms' command against the lost+found folder as it will be owned by root. Is there a way to check for corruption data integrity for the rest disks or I might have some silent corruption and I end after some time with 3-4 unmountable disks? You can put the array into maintenance mode and run reiserfsck --check against each disk in turn. You can also run the same checks from the GUI. Riserfs is rock solid when I add new disks I see that is advised to use xfs. In a similar case xfs filesystem will recover from a disk rebuild? Reiserfs is at end-of-life and XFS is very mature and is is deemed a better way forward. It also seems to have better performance - particularly as disks get full).. However the ability of reiserfs to recover from corruption is amazing - the equivalent tools for the other file systems do not seem as 'magical' so the decision is really up to you as to what file system to use. In terms of rebuilding a disk the process is the same whatever file system is used as parity is not file-system aware and works at the disk sector level. Btrfs has brot (for data intergrity) thing but unraid will identify the problem but won't be able to correct it because every disk has its own filesystem. True. You can install the Dynamix File Integrity plugin that can provide checks for bit-rot type corruption. For recovery you are expected to have backups.
January 31, 201610 yr Community Expert After you copy anything important from lost+found you can try deleting from the console: cd /mnt/disk2 rm –r lost+found I changed all my disks to xfs and because of performance would never go back to reiser, but reiserfsck does indeed work very well. A while I did a quick test with the 3 filesystems, fill them with video files, format them and then try to recover: -Reiser recovered all files, only one failed checksum but it was playable -XFS lost 8 files (from 153) and another failed checksum but also still playable -BTFFS didn't recover anything, although there’s little documentation on btrfs –repair, but I tried all option without success.
February 1, 201610 yr Author Frome console mode permission denied for erasing lost +found folder. After rebuild and recover I started a parity check and I it's still in process but takes forever with lots of errors.(till 40% was going fast) Before the unmountable disk issue I have been through parity check with no issues... https://www.dropbox.com/s/6i2id5vv52nkmcb/Screenshot_2016-02-01-15-41-16.png?dl=0
February 1, 201610 yr Community Expert Looks like something went wrong with the parity swap and parity data > 2tb was not zeroed. You can let the parity check finished, it’s going to take a long time because of the corrections but it’s safer, or can do a new config and do a new parity sync, this will be faster but you’ll be unprotected until it finishes. Since all activity now is limited to the parity drive and wont affect array read performance I would just let it finish.
February 19, 201610 yr Author Thank you all for your support , help. Server after those days seems stable again There is only one issue i didnt have before and i am not sure when exactly occurred. There is a share that is disappeared through user share but i still have direct access to all files. Any idea how to fix this? tower-diagnostics-20160219-1614.zip
February 19, 201610 yr Community Expert Looks like you still have filesystem corruption on disk2. Feb 19 15:47:59 Tower shfs/user: shfs_readdir: fstatat: The Adventures of Tintin (2011) 720P.HDxT (13) Permission denied Feb 19 15:47:59 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk2/Media/HD Movies (13) Permission denied Feb 19 15:47:59 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 768 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:47:59 Tower shfs/user: shfs_readdir: fstatat: The Adventures of Tintin (2011) 720P.HDxT (13) Permission denied Feb 19 15:47:59 Tower shfs/user: shfs_readdir: readdir_r: /mnt/disk2/Media/HD Movies (13) Permission denied Feb 19 15:47:59 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 768 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 817 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 716 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 808 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 908 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 892 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 743 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 750 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 756 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 762 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 4 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [5 419 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 768 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 774 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 780 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 786 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 793 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 799 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 899 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 914 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 675 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 680 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [136829 139423 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [136829 139428 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [136829 139434 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [136829 139441 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [136829 139438 0x0 SD]. This is likely to be race with knfsd. Ignore Feb 19 15:50:42 Tower kernel: REISERFS warning (device md2): vs-13075 reiserfs_read_locked_inode: dead inode read from disk [6 15 0x0 SD]. This is likely to be race with knfsd. Ignore
February 19, 201610 yr Author disk2 has been rebuild,replaced with new drive and after i did parity check.there is a lost and found folder there with 2 empty folders that cant be delete it though. shall i start the array in maintenance mode and do and fsck check on disk2? i run again reiserfsck --check /dev/md2
Archived
This topic is now archived and is closed to further replies.