DrMexSS Posted March 19, 2011 Share Posted March 19, 2011 UPDATE: I have solved the problem. If you don't want to read everything I have summarized the steps in my 4th post in this thread. Hi everyone, I have doubts about my next step. I have replace a disk of my system because I bought a bigger one. After doing that I run /root/mdcmd check NOCORRECT It's now running and it's founding errors... aaaaaaaaaahhhhhhhh My system: Unraid 4.5.6 - Plus key USB: Sandisk cruzer (16gb) Mobo: Asus P5Q-VM (asus) : Intel® G45 / ICH10 NIC: Intel pro/1000 CPU: Celeron S 430 (1x 1800 MHz) RAM: 2 GB Disks: Parity : Western Digital 2TB : WDC_WD20EARS-00S8B1_WD-WCAVY2494470 Disk 1: Seagate 1TB : ST31000528AS_6VP0EL3G Disk 2: Seagate 500 GB : ST3500830AS_9QG67DQX (THIS is the one I wanted to change) Disk 3: Hitachi 750 GB : Hitachi_HDT721075SLA380_STA401MG03XLPA Disk 4: Seagate 1 TB : ST31000528AS_6VP05W02 Disk 5: Western Digital 2TB : WDC_WD20EARS-00S8B1_WD-WCAVY2502205 I changed the disk2 with a new Western digital 2TB (WD20EARS) The process I followed to change the disk was: 1º- Parity check. 2º- Power down, change the disk, power on NOTE: I did NOT use the preclear.sh script !!!!! 3º- Devices tab > the new disk was assigned automatically to the missing slot. 4º- Start button, "I'm sure..." The system did a rebuilding with zero errors. Then, as I said, I run a non-correcting parity check :: /root/mdcmd check NOCORRECT It hasn't finished yet. Here it is a capture of the process: Syslog: r 19 00:16:39 Tower login[1401]: ROOT LOGIN on `tty1' Mar 19 00:18:44 Tower kernel: mdcmd (39): check NOCORRECT Mar 19 00:18:44 Tower kernel: Mar 19 00:18:44 Tower kernel: md: recovery thread woken up ... Mar 19 00:18:44 Tower kernel: md: recovery thread checking parity... Mar 19 00:18:44 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. Mar 19 00:18:45 Tower kernel: md: parity incorrect: 26272 I have errors, (at least one). What do I have to do next? This is my guess: 1- Stop the array. 2- Do a presclear.sh to the new disk2. 3- Assign again to the array 4- Start the array. It will start again the process of rebuilding. Is it correct? Thank you. UPDATE : It has finished and it has just found that one error. UPDATE : I have attached the smartctl info (command : SMARTCTL -a -d ata /dev/sdc | todos ) resultsSMARTCTL-2011-03-19.txt Quote Link to comment
dgaschk Posted March 19, 2011 Share Posted March 19, 2011 Post a SMART report for the drive. Quote Link to comment
DrMexSS Posted March 19, 2011 Author Share Posted March 19, 2011 The test has finished, just one error. I have attached the smart report. Thank you Quote Link to comment
DrMexSS Posted March 23, 2011 Author Share Posted March 23, 2011 I have done the following: -Stop the array -Unassign disk -Start the array -Stop the array -run the preclear over the disk 2 Results: NO errors ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdc = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 32C, Elapsed Time: 27:10:05 ========================================================================1.9 == WDC WD20EARS-00MVWB0 WD-WMAZA3272095 == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdc /tmp/smart_finish_sdc ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Temperature_Celsius = 118 121 0 ok 32 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. -Then I assigned the disk again to the Disk 2 slot -Start the array The rebuilding process started. - I did NOT run the "mdcmd check NOCORRECT" command but after the rebuilding I still have the 1 error. What I shoud do now? Normal parity test and forgive? I have attached the new smartctl report of the disk 2 smartctl_post.txt Quote Link to comment
DrMexSS Posted March 23, 2011 Author Share Posted March 23, 2011 Well, it's solved. After the rebuilding I run a NONCORRECT test and now I don't have any errors. I suppose that the error from the previous post is a bug. The system remembers that there WAS a sync error. So, to sum up. The complete process I've followed is: -I wanted to replace one disk with a bigger one -I run a parity check (the regular one, with corrections) -It found 0 errors. -Stop the array -Unassign the disk (disk 2 in my case) -Power down -Change the disk Note, I forgot to run the preclear script on the new disk -Power on -Assign the new disk to the old slot -Start the array :: Rebuilding -When finish. Run the command /root/mdcmd check NOCORRECT You can see the progress in the web browser -It found 1 error -Stop the array -Unassign the disk -Start the array. -Stop the array. -Run the preclear.sh script over the disk (28 hours... bufff) -Assign the disk to the slot -Start the array. Rebuild. -It finish. It shows 1 error. I assume it's the old one. -Run the command. /root/mdcmd check NOCORRECT It shows NO error Quote Link to comment
dgaschk Posted March 23, 2011 Share Posted March 23, 2011 Another option is that there is an error in the data on new disk. The original error could have been on any of the array disks including parity. Rebuild used the data on all remaining disk plus the parity. If the original error was on the replaced disk then it has been corrected. If the error was not on the replaced disk then the original error still exists and the corresponding bit on the new disk has been flipped incorrectly to match parity. If parity was wrong then there is a single wrong bit on the new disk. If parity was correct then there is a wrong bit on the new disk and on the other data disk that still has a wrong bit. Clear as mud, right? unRAID is designed to support the failure of an entire disk and does this well. It currently fails to handle occasional bit errors gracefully. The parity check indicates that a bit error has occurred but does nothing to indicate on which disk the error lays. It is difficult to determine the source of parity check errors. The error may be gone and you will probably never notice a single error or one or at most 2 disks. Quote Link to comment
DrMexSS Posted March 25, 2011 Author Share Posted March 25, 2011 But I have run the test with the NOCORRECT option. The second time has NOT found any errors. I think everything is ok now. In my opinion the error happened when Unraid rebuilt the disk the first time. I mean: -Before opening the case, I run a parity check. -> No errors. -I change the old disk with the new one. -Unraid rebuilt the data (here there was 1 error because:) -I run the test with NOCORRECTION option -> It found the error -I unassigned the disk and precleared it. -I put again the disk to the system. Unraid rebuilt again the data. -I run the test with NOCORRECTION option -> It did not found any error. So, the second rebuilt was done correctly and my data is safe. p.d: Maybe I have been paranoid, it's just one bit!! (or byte?) Quote Link to comment
lionelhutz Posted March 25, 2011 Share Posted March 25, 2011 Another option is that there is an error in the data on new disk. The original error could have been on any of the array disks including parity. Rebuild used the data on all remaining disk plus the parity. If the original error was on the replaced disk then it has been corrected. If the error was not on the replaced disk then the original error still exists and the corresponding bit on the new disk has been flipped incorrectly to match parity. If parity was wrong then there is a single wrong bit on the new disk. If parity was correct then there is a wrong bit on the new disk and on the other data disk that still has a wrong bit. Clear as mud, right? Not really. unRAID has absolutely no knowledge of the expected state of the bits on any single disk during the rebuild process. The disk rebuilding function assumes the other data disks and parity are correct because it must use all that data to rebuild the new disk. If there was a bit that was wrong on another disk, that would introduce a corresponding wrong bit on the disk being rebuild and these 2 data errors would effectively cancel each other out as far as the parity check was concerned. So, in a perfect world, the parity check would never have an error right after a disk rebuild. Peter Quote Link to comment
dgaschk Posted March 25, 2011 Share Posted March 25, 2011 Another option is that there is an error in the data on new disk. The original error could have been on any of the array disks including parity. Rebuild used the data on all remaining disk plus the parity. If the original error was on the replaced disk then it has been corrected. If the error was not on the replaced disk then the original error still exists and the corresponding bit on the new disk has been flipped incorrectly to match parity. If parity was wrong then there is a single wrong bit on the new disk. If parity was correct then there is a wrong bit on the new disk and on the other data disk that still has a wrong bit. Clear as mud, right? Not really. unRAID has absolutely no knowledge of the expected state of the bits on any single disk during the rebuild process. The disk rebuilding function assumes the other data disks and parity are correct because it must use all that data to rebuild the new disk. If there was a bit that was wrong on another disk, that would introduce a corresponding wrong bit on the disk being rebuild and these 2 data errors would effectively cancel each other out as far as the parity check was concerned. So, in a perfect world, the parity check would never have an error right after a disk rebuild. Peter Isn't this what I said? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.