June 18, 201115 yr After THIS , I'm now pretty sure there is something wrong. Each time I replace a single disk with a bigger one, there is a parity sync error, exclusively on the first check after reconstruction, exclusively in the housekeeping area. I've made it two times, with the same result, but not the same amount of errors (one time three, one time seven), with different HDDs. My system seems healthy, four months old now, with roughly two parity check a mont, of course without any errors. I'm in a upsizing process now, and I'm in a upsizing process, swapping six 1TB HDDs by 2TB HDDs. I've already replace two of them, both with issues (parity sync errors) and in the first time, with the first replace I think it was a good idea to go back from the 2TB to 1TB. After some suggestion and more reading of likely problems with unraid 4.7, I prefer to go further, with the 2TB in place. I've managed a lot of parity check, about 10 incomplete (cancelled before 1%, in the housekeeping area) and 3 complete, all successfully. I really think I can trust my system, I've made a lot of read/write on the new 2TB drive without a hitch. Now I've just made the same process again for another HDD, upsize from a seagate 1TB to a Samsung 2TB. So I've already swap one HDD from 1TB to 2TB, this is the second one, on another slot of course. And again, the same problem arise, but now with 7 sync errors. Jun 17 01:13:40 Tower kernel: mdcmd (40): check CORRECT (unRAID engine) Jun 17 01:13:40 Tower kernel: md: recovery thread woken up ... (unRAID engine) Jun 17 01:13:40 Tower kernel: md: recovery thread checking parity... (unRAID engine) Jun 17 01:13:40 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. (unRAID engine) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 15312 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 15320 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 16560 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 16568 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 16576 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 17120 (Errors) Jun 17 01:13:41 Tower kernel: md: parity incorrect: 17128 (Errors) Those sync errors looks like in the same housekeeping area. 3 Possibilities : I'm doing something wrong (what?), my system is wrong (where?), there is a little bug in the 4.7. The complete process I've made two times : - Modification of the go file, to REMark the cache_dirs command line. - Stop the array, Power off. - Swap a HDD Seagate 1TB to a Samsung 2TB ( HD204UI japanese firmware 1AQ10003, normally error-free). - Power on the array - Start the array - Wait until reconstruction end. Read/Write some file in the array, but not on the disk under construction. - Re-enable the cache_dirs command line in the go file. - Stop and Reboot the array. - Wait till the cache_dirs end reading directories. - Start the parity check -> sync errors (7 now). The HDDs were connected on a SuperMicro AOC-SASLP-MV8, I've never meet sync errors before, the SATA cables are from SuperMicro also. The T° are everytime below 48°c, on the reconstruction or the parity check process. Any clue? Little Bug in 4.7?
June 18, 201115 yr I found this while doing a disk rebuild with 5.0 beta7. I agree it is a bug. I rebuild so infrequently that I've never seen it before. I reported to Tom but you might want to send him an email linking this post so he'll have more data. Make sure to post a full sysig. It's only through efforts like yours that these issues get found and ultimately fixed. In terms of severity, I believe that chances are good these sync errors in the housekeeping area are not going to lead to data corruption or loss. Just make sure to run a correcting parity check after a rebuild to correct them.
June 19, 201115 yr - Re-enable the cache_dirs command line in the go file. - Stop and Reboot the array. - Wait till the cache_dirs end reading directories. - Start the parity check -> sync errors (7 now). Could this be a cache_dirs problem? Try the check before starting cache_dirs.
June 19, 201115 yr No. Cache_dirs does not affect the bits written to the disk nor does it affect the bits read during a parity check.
June 19, 201115 yr I've had the same problem after enlarging my disks (hpa removal to allow 4.7 upgrade). I had to do 3 disks, but only after 1 disk I got parity errors in the <50000 sector area.
June 19, 201115 yr Author I've had the same problem after enlarging my disks (hpa removal to allow 4.7 upgrade). I had to do 3 disks, but only after 1 disk I got parity errors in the <50000 sector area. Strange. And after, not any problem/errors? Looking into this... I'll will upsize another HDD soon, exactly same Seagate 1TB -> Samsung 2TB couple I've already made twice, but connected on the second SuperMicro AOC-SASLP-MV8 this time. BTW, all those 2TB HDDs were precleared with MBR 4KB aligned, but the 1TB were unaligned.
June 20, 201115 yr Strange. And after, not any problem/errors? No, did a parity check which corrected the parity errors and ofcourse was not very happy with the errors in the housekeeping area. I did reiser checks afterwards, no problems, did new parity checks, no errors... With the second and third resize I did not have issues at all. One fact I should mention. 1st "HPA removal" was by replacing it with another drive, 2nd/3rd were with an in-place hpa removal. I can't seem to find the correct syslog with the parity errors which is a shame...
June 21, 201115 yr This looks like the same problem as is mentioned in: http://lime-technology.com/forum/index.php?topic=13603.msg128860#msg128860 which happens with 4.5.6 and 4.7. Regards, Stephen
June 21, 201115 yr Author I'll will upsize another HDD soon, exactly same Seagate 1TB -> Samsung 2TB couple I've already made twice, but connected on the second SuperMicro AOC-SASLP-MV8 this time. Upsizing done, same process, same error (only one this time). Jun 21 13:24:14 Tower kernel: mdcmd (40): check CORRECT (unRAID engine) Jun 21 13:24:14 Tower kernel: md: recovery thread woken up ... (unRAID engine) Jun 21 13:24:14 Tower kernel: md: recovery thread checking parity... (unRAID engine) Jun 21 13:24:14 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. (unRAID engine) Jun 21 13:24:15 Tower kernel: md: parity incorrect: 30768 (Errors) I'll let the parity check till end, and then I will post the complete syslog. I've some other syslog ready : - after parity check, before power down for the upsizing. - after successful data rebuild. Until now, it's systematic (3 upsizes, 3 problems) Questions : - Is it worrying, for the data in the future? If not, how may I sure my data are ok? (reiserfsck...) - I have to upsize more disks (3) but is it not a better idea to 1) install the new 2TB disk (already precleared. 2) copy the datas from old 1TB to the new disk. 3) zeroing the old 1TB disk. 4) removing the old disk from array. 5) confirm that the data AND the parity is ok to unraid. More long, more dangerous, more difficult, is it the way to follow?
June 23, 201115 yr Author I'll let the parity check till end, and then I will post the complete syslog. Here attached, with some others parity check following, but without errors. Not any advices, anyway? syslog-2011-06-23_parity_check_w-errors_then_without.txt
June 23, 201115 yr I'd run reiserfsck on the rebuilt disks. Using md5deep (unmenu package) could then be used to verify file integrity but hashes need to be computed first.
Archived
This topic is now archived and is closed to further replies.