July 27, 200817 yr After completing back-to-back disk upgrades (to get rid of the 1TB drive size issues mentioned here http://lime-technology.com/forum/index.php?topic=2230.0), I ran a parity check which came back with 5 errors. Is this anything to anything to worry about? At this point, I still have the original data drives, so I could recreate the contents by copying instead of the disk upgrades, but this would take a long time. Are there any utilities that could compare the two files (or filesystems) to see if/where any discrepancies exist? Can I use the incorrect parity locations referenced in the syslog to determine the affected files? In case this helps, here is the step-by-step that I did during the upgrade: 1. Replaced parity drive 2. Upgraded from 4.2.4 to 4.3.3 3. Built parity 4. Upgraded Disk 5 (of 7+1 disk array) 5. Decided to run reiserfsck on all disks. I should have done this prior to changing disk but didn't think of it. No corruption errors on any of the disks. 6. Ran parity check. Came up with a handful of errors, but I thought it might be due to reiserfsck. 7. Upgraded Disk 6 8. Upgraded Disk 1 9. Ran parity check revealing 5 errors (@ 30488, 32464, 32488, 42048, 42072)
July 27, 200817 yr Did you run the reiserfsck on the raw disks (/dev/sd[abcdefghi...]) or on /dev/md1, /dev/md2,/dev/md3... Joe L.
July 27, 200817 yr Author I ran them on /dev/md1 through /dev/md7 per this post http://lime-technology.com/forum/index.php?topic=463.msg3192#msg3192 So I was a little surprised to see the parity errors. I wasn't surprised enough to think to capture the syslog at the time.
July 27, 200817 yr You did it correctly. Parity should have been kept in sync. Since you have had errors, and they have already been corrected on the parity disk, you need to try a parity sync once more. Continued random errors are usually an indication of some kind of hardware issue. Time to post a syslog after you do another parity check. Also time to run smartctl on each of your disks. (both short and long tests) lastly, you can compare two files with a "dos" command in windows: comp file1 file2 Joe L.
July 28, 200817 yr Author I re-ran the parity check, and it came back with zero errors. Syslog is attached. I had forgotten to mention that I had also re-run the parity check after Step 6, and the second check came back with zero errors, as well. I'll go ahead and run the smartctl tests on all the drives. That's probably a good idea since I'm basically rebuilding the machine anyway. As for the data, if there is no easy way to pinpoint which individual file(s) might have corruption based on the locations of the 5 parity errors, I think maybe I should just recopy the data content onto the new drives using copy commands instead of disk upgrade. Though it's scary that disk upgrade wouldn't be 100% bit-perfect given that that's what it's for. Maybe I'm making assumptions too early and smartctl will shed some light as to a hardware issue that might have caused the errors. EDIT: I can't upload the syslog because the upload folder is full. I'll PM Tom.
July 28, 200817 yr I re-ran the parity check, and it came back with zero errors. Syslog is attached. I had forgotten to mention that I had also re-run the parity check after Step 6, and the second check came back with zero errors, as well. A parity check always assumes the data disks are correct. It then updates the calculated parity bits on the parity drive if it thinks it is out of sync. So, if a bit on a file on a data drive was now reading differently than when originally written, parity has been updated to reflect that change. In other words, if you were to reconstruct the files on a a data disk, it would now have the change. The subsequent parity checks that all show zero parity errors show that your disks are reading consistently. That is good. I'll go ahead and run the smartctl tests on all the drives. That's probably a good idea since I'm basically rebuilding the machine anyway. The results of the smartctl tests should give you an indication of which data drive (if any) has any errors. (or sectors pending re-allocation) Sectors pending re-allocation are bad, since the read of that sector failed, and the drive is waiting for a subsequent write of that sector to re-allocate it with the newly written contents. I would run the smartctl "long" test on each of your disk drives. It takes quite a while, but probably goes through the process of reading the whole disk looking for errors. As for the data, if there is no easy way to pinpoint which individual file(s) might have corruption based on the locations of the 5 parity errors, I think maybe I should just recopy the data content onto the new drives using copy commands instead of disk upgrade. Though it's scary that disk upgrade wouldn't be 100% bit-perfect given that that's what it's for. Maybe I'm making assumptions too early and smartctl will shed some light as to a hardware issue that might have caused the errors. Disk upgrades do get a 100% bit perfect image, but only if parity was correct. But... think of it this way... if the parity drive had the "wrong" value, and the data disks were correct, running a parity check will fix the parity value and correct the wrong value. Since you had just replaced the parity drive, it is as suspect as any other for having blocks that could be marked as bad by the smart system. You really need to run it to see which drive might have been the drive with the errors. EDIT: I can't upload the syslog because the upload folder is full. I'll PM Tom. You can compare the "checksums" of two files in linux... if different, the files differ. in lunix type: sum file1 file2 or md5sum filename or md5sum file1 file2 (You can get md5 checksum utilities for windows to check the original source files) Joe L.
July 28, 200817 yr Author Here is a link to the syslog: http://pastebin.com/m5f61b223 I'll run smartctl on all drives tonight. I should be able to open multiple telnet sessions and run them concurrently, right?
July 28, 200817 yr Your syslog looks OK, except for the following sets of messages. usb 1-1: reset high speed USB device using ehci_hcd and address 2 kernel: sd 0:0:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 kernel: end_request: I/O error, dev sda, sector 52607 usb 1-1: reset high speed USB device using ehci_hcd and address 2 usb 1-1: reset high speed USB device using ehci_hcd and address 2 emhttp: get_config_idx: fopen /boot/config/shares/HD DVDs.cfg: No such file or directory - assigning defaults emhttp: get_config_idx: fopen /boot/config/shares/Temporary.cfg: No such file or directory - assigning defaults emhttp: get_config_idx: fopen /boot/config/shares/VMware.cfg: No such file or directory - assigning defaults There were 4 resets (one not shown) of your USB drive, and the I/O error in sector 52607, and the file open errors. I would try attaching your flash drive to different ports, and if you still get similar messages in your syslog, then consider replacing it. The resets could result in unRAID being unable to find the /boot/config folder, and cause unnecessary parity checks on subsequent reboots. If it helps, the 5 parity check errors you mentioned (30488, 32464, 32488, 42048, 42072) are extremely low blocks, and therefore are probably part of the Reiser file system structures. It is very slightly possible that they involve very small files, stored within the Reiser B-trees. And extremely unlikely, but it is possible that they involve the tail ends of larger files, but most likely not, as you would have had errors in much higher blocks involved with other parts of those larger files. In other words, I don't think you have any file corruption. (terrible explanation here, sorry) Edit: Forgot to mention that since I have never seen those USB messages before, I can't really speculate on the significance or indicated issue. They are NOT normal though.
July 29, 200817 yr Author Each short and extended offline tests completed without error (http://pastebin.com/m16f30efd). RobJ, I'll try other USB ports and see if those errors go away. Based on the info, I'm probably just overly paranoid, but I'm going to hook up one of the old drives and run a Windows program called CloneSpy to recursively see if each file is the exact same. I'll post results. Thanks for the help, everyone.
Archived
This topic is now archived and is closed to further replies.