BillyJ Posted February 27, 2015 Share Posted February 27, 2015 During the move from reiserfs to XFS my server suffered a CPU stall and a hard reset was the only option available. It was in the process of moving 2.7 TB worth of a Movies folder from Disk5 4TB (RESIER) to Disk6 3TB (XFS). I know it didn't complete the move, maybe 25% through. Used disk space is 2.90 TB. I kick off the move via Midnight Commander using the F6 function, I get prompted that the Target already exists. I choose Over all targets? NONE. Now i've got 6TB (or close enough) to duplicate data and there is no way there is enough free space to continue. Does anyone have any ideas? Is the data in fact duplicate so i should be able to delete the Movies folder off my Disk6 and restart a complete move? Thanks Will Quote Link to comment
BillyJ Posted February 27, 2015 Author Share Posted February 27, 2015 I might actually be in the clear! Using F6 RenMov in MC just finished and now i am spot check a few files then delete Movies from Disk5. Quote Link to comment
WeeboTech Posted February 27, 2015 Share Posted February 27, 2015 I would use rsync to copy/move files from one disk to another via the command line. rsync will compare modification time and size, then skip the files that already exist. example usage would be cd /mnt/disk5 rsync -avPX . /mnt/disk6 you can use the -n flag to do a dry run and see what would be copied (I almost always do that first) rsync -n -avPX . /mnt/disk# see what spits out then do it without the -n AFTER all is said and done you can do another rsync to compare files by checksum rather them modification time and size do this with rsync -n -rcvPX . /mnt/disk# This will show you what might get copied because the files did not compare via checksum. If you want to do the actual copy again for files that do not compare take off the dry run flag (-n) My final command is usually to remove what compares with --remove-source-file. rsync --remove-source-file -rcvPX . /mnt/disk# This has the effect of comparing the tree with checksums and removing files that match. After that I do a final review down the tree with find to make sure there are no files left over find . -type f -ls and a final cleanup with find . -depth -type d -empty -ls -delete which removes empty directories Tips: When doing this. Always be careful of where you are. Always use the -n flag first to make sure you get the expected results. Quote Link to comment
xamindar Posted March 9, 2015 Share Posted March 9, 2015 That is an awesome post WeeboTech, you should sticky that somewhere with best practices for moving files between disks. Quote Link to comment
tcharron Posted July 6, 2015 Share Posted July 6, 2015 rsync -n -rcvPX . /mnt/disk# This will show you what might get copied because the files did not compare via checksum. If you want to do the actual copy again for files that do not compare take off the dry run flag (-n) What would cause rsync errors? I copied about 3TB over using rsync, and the above turned up about 500 errors. I looked at a few, and it seems that there is a single byte that differs between source and destination in each of the files. The vast majority of the files (there are 542000 of them) were copied fine. The server has run flawlessly for years. I installed a new 4TB drive (which is the destination here). It passed a preclear before I started using it. The source drive is reiserfs and the destination is xfs. Copying the problem files is straightforward enough (and seems to work), but I'm not sure I trust this array any more. Quote Link to comment
itimpi Posted July 6, 2015 Share Posted July 6, 2015 Did you ever run v5 beta 7/Beta 8? Those releases had a reiserfs bug that could result in silent file corruption. Quote Link to comment
SSD Posted July 6, 2015 Share Posted July 6, 2015 rsync -n -rcvPX . /mnt/disk# This will show you what might get copied because the files did not compare via checksum. If you want to do the actual copy again for files that do not compare take off the dry run flag (-n) What would cause rsync errors? I copied about 3TB over using rsync, and the above turned up about 500 errors. I looked at a few, and it seems that there is a single byte that differs between source and destination in each of the files. The vast majority of the files (there are 542000 of them) were copied fine. The server has run flawlessly for years. I installed a new 4TB drive (which is the destination here). It passed a preclear before I started using it. The source drive is reiserfs and the destination is xfs. Copying the problem files is straightforward enough (and seems to work), but I'm not sure I trust this array any more. Suggest you post more details about your system. What version of unRaid are you running? Motherboard? Controllers? A syslog might be helpful. There was a bug in RFS from v6 (not 5) beta 7/8 that caused corruption. If you are running such a version, suggest you immediately upgrade. If not, the more details you can provide the better. What you are describing is not normal. Update: thinking further - have you run a memory test? Faulty memory could cause such a symptom. Quote Link to comment
tcharron Posted July 6, 2015 Share Posted July 6, 2015 Did you ever run v5 beta 7/Beta 8? Those releases had a reiserfs bug that could result in silent file corruption. Isn't this irrelevant? Whether my data (on the source disk) is corrupt or not shouldn't matter. Even if it is corrupt, I'd expect v6 to be able to duplicate it perfectly to the new drive. In any event, I don't think that I ever ran v5 beta 7 or 8. Quote Link to comment
tcharron Posted July 6, 2015 Share Posted July 6, 2015 rsync -n -rcvPX . /mnt/disk# This will show you what might get copied because the files did not compare via checksum. If you want to do the actual copy again for files that do not compare take off the dry run flag (-n) What would cause rsync errors? I copied about 3TB over using rsync, and the above turned up about 500 errors. I looked at a few, and it seems that there is a single byte that differs between source and destination in each of the files. The vast majority of the files (there are 542000 of them) were copied fine. The server has run flawlessly for years. I installed a new 4TB drive (which is the destination here). It passed a preclear before I started using it. The source drive is reiserfs and the destination is xfs. Copying the problem files is straightforward enough (and seems to work), but I'm not sure I trust this array any more. Suggest you post more details about your system. What version of unRaid are you running? Motherboard? Controllers? A syslog might be helpful. There was a bug in RFS from v6 (not 5) beta 7/8 that caused corruption. If you are running such a version, suggest you immediately upgrade. If not, the more details you can provide the better. What you are describing is not normal. Update: thinking further - have you run a memory test? Faulty memory could cause such a symptom. Thankfully, whatever is going on isn't as 'scary' as it could be, since system is not my primary unraid server. This entire system is a backup for my primary unraid server, and even with a total loss of the system I wouldn't lose anything of value. The system: Motherboard: Intel D975XBX CPU: Intel® Core™2 Extreme Processor X6800 @ 2.93GHz Ram: 8G Controller: m1015 Drives: Array drives (connected to m1015 controller) Parity 4TB WDC_WD40EZRX Disk1 3TB ST3000DM001 reiserfs Disk2 3TB WDC_WD30EFRX reiserfs Disk3 3TB WDC_WD30EFRX reiserfs Disk4 3TB ST3000DM001 reiserfs Disk5 4TB WDC_WD40EZRX reiserfs Disk6 4TB WDC_WD40EZRX xfs Non-array 64G Crucial SSD (connected to m1015 controller) 500G ST3500830A EIDE It does look like the drives are running a bit hot -- parity is at 41 degrees Celsius I haven't run a memory test since the box was set up. Disk6 and the 500G EIDE drive were just installed as I upgraded to v6. Maybe I have a loose cable somewhere. I just tried some copies again, and had some crazy results... I now think that I am having read errors on disk5!... root@BadAxe:~# rsync -crvaPX /mnt/disk5/tower_backup/tower/videos_home/* /mnt/disk6/t/tower_backup/tower/videos_home/ sending incremental file list MP_Wedding.mpeg 1,019,044,339 100% 17.23MB/s 0:00:56 (xfr#1, to-chk=85/87) peyton.avi 2,095,902,208 100% 15.79MB/s 0:02:06 (xfr#2, to-chk=83/87) sent 3,115,711,126 bytes received 62 bytes 8,715,276.05 bytes/sec total size is 6,643,050,254 speedup is 2.13 root@BadAxe:~# rsync -crvaPX /mnt/disk5/tower_backup/tower/videos_home/* /mnt/disk6/t/tower_backup/tower/videos_home/ sending incremental file list EA_Wedding/Wedding_320x240.avi 1,245,246,976 100% 21.09MB/s 0:00:56 (xfr#1, to-chk=75/87) sent 1,245,555,048 bytes received 43 bytes 3,886,287.34 bytes/sec total size is 6,643,050,254 speedup is 5.33 root@BadAxe:~# What the above shows is that the first copy pass confirmed that the Wedding_320x240.avi file was ok, but the second pass detected that it needed to be copied again. Definitely starting to sound like it could be memory or a disk read error. I am not at home right now. I will kick off a parity check though. I think that a "pass" would mean that I have a RAM problem. Parity problems wouldn't tell me much -- bad parity could be due to a either drive or RAM problems. Quote Link to comment
SSD Posted July 6, 2015 Share Posted July 6, 2015 Good news. Might want you consider moving off the EIDE drives, which are pretty long in the tooth at this point. Quote Link to comment
tcharron Posted July 6, 2015 Share Posted July 6, 2015 Good news. Might want you consider moving off the EIDE drives, which are pretty long in the tooth at this point. Well.. not great news though. I still have a system which isn't working as it should... Re the EIDE drive: It has low hours -- just trying to squeeze a bit of value from some old hardware. Quote Link to comment
tcharron Posted July 6, 2015 Share Posted July 6, 2015 Update: thinking further - have you run a memory test? Faulty memory could cause such a symptom. We have a winner! I downloaded and compiled memtester... which allows me to test RAM on a running system... Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 6144MB (6442450944 bytes) got 6144MB (6442450944 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 1FAILURE: possible bad address line at offset 0x1614d95d0. Skipping to next test... Random Value : ok Compare XOR : ok Compare SUB : ok FAILURE: 0x1f50973e6e6cca44 != 0x1f50973e6e6eca44 at offset 0xa14d9dc8. Compare MUL : FAILURE: 0x00000000 != 0x00020000 at offset 0xa14d9dc8. Compare DIV : Compare OR : ok Compare AND : ok FAILURE: 0x7ef951d11c2495ac != 0x7ef951d11c2695ac at offset 0xa14d9dc8. Sequential Increment: Solid Bits : testing 0FAILURE: 0x00000000 ! = 0x00020000 at offset 0xa14d9dc8. Block Sequential : testing 0FAILURE: 0x00000000 != 0x00020000 at offset 0 xa14d9dc8. Checkerboard : testing 1FAILURE: 0x5555555555555555 != 0x555555555557 5555 at offset 0xa14d9dc8. Bit Spread : testing 6 Definitely memory related. The errors make me think I might have a poorly seated memory card. The errors all seem to be in the same data line. Quote Link to comment
trurl Posted July 6, 2015 Share Posted July 6, 2015 memtest is included with unRAID. You select it from the boot menu. It doesn't work with a running system like that other tool you mentioned, but you might also try testing with memtest. Quote Link to comment
JonathanM Posted July 6, 2015 Share Posted July 6, 2015 memtest is included with unRAID. You select it from the boot menu. It doesn't work with a running system like that other tool you mentioned, but you might also try testing with memtest. There are good reasons NOT to run it on a running system, especially one which is storing and serving valuable data. Not least of which is the need for the running system to occupy at least some of the ram that needs to be tested. Dedicated memory test programs that run by themselves are a much better option. Quote Link to comment
SSD Posted July 6, 2015 Share Posted July 6, 2015 Certainly not smart to run a system with known bad memory. Quote Link to comment
tcharron Posted July 7, 2015 Share Posted July 7, 2015 Hey -- "not smart" stings! Recall... Thankfully, whatever is going on isn't as 'scary' as it could be, since system is not my primary unraid server. This entire system is a backup for my primary unraid server, and even with a total loss of the system I wouldn't lose anything of value. I know the risks of running memory tests on an active system. Before running this tool, I already concluded that all this data can't be trusted since i don't know how long the RAM has been failing. I have the luxury of not having to trust this data since it's a backup system. Once I get the RAM problem fixed, I'll either be deleting everything before the next backup, or using "rsync -c" to verify that it is a true image of the primary unraid server. I sometimes wonder if I have gone a bit overboard... I have a secondary unraid server that I use to periodically back up my primary unraid server. The primary server is mostly backups of files stored on various computers, including snapshots and inbound crashplan backup files. I'm exposed due to not having one of the servers off site, but otherwise feel pretty safe. I did run memtest overnight last night -- it turned up 1600 errors in 10 hours, all related to occassional fails in one bit of addresses in a very narrow range of memory. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.