February 21, 201115 yr Hi all, I just built my unraid server, ran first parity sync and encountered a bunch of errors. Here are some background info: 1) Ran preclear on all disks and results were good (surprisingly good ). 2) Added Disk1 without parity disk 3) Added Disk2 without parity disk 4) Copied data to Unraid Share 5) Added Disk3 without parity disk 6) Added Parity disk, prompted to format then ran parity sync 7) During parity, a red ball was next to Parity in Unraid main screen. While on Unmenu main screen, "invalid parity" was shown. At the end, 127 errors and a big syslog :'(. My Hardware: Ausus AT3N7A-i motherboard with Atom 330 2 GB Kingston DDR2 value ram 1X gerneric brand PCI raid card 450 Watts generic PSU 1GB SanDisk Cruzer USB flash drive 4X WD20EARS disks, 1X Hitachi 500GB 7200 rpm cache disk All WD20EARS housed in a Supermicro CSE-M35T 4 in 3 cage connected to Motherboard's SATA ports Cache connected to Raid card Here's a small sample of the error messages in syslog: Feb 21 08:25:57 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Feb 21 08:25:57 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:25:57 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:00 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Feb 21 08:26:00 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:26:00 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:02 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Feb 21 08:26:02 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:26:02 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:05 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)F eb 21 08:26:05 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:26:05 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:08 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Feb 21 08:26:08 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:26:08 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:10 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Feb 21 08:26:10 Tower kernel: res 51/40:df:08:78:f0/00:02:aa:00:00/f0 Emask 0x9 (media error) (Errors) Feb 21 08:26:10 Tower kernel: ata2.00: error: { UNC } (Errors) Feb 21 08:26:10 Tower kernel: end_request: I/O error, dev sda, sector 2867886088 (Errors) Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors) Feb 21 08:26:10 Tower kernel: handle_stripe read error: 2867886024/2, count: 1 (Errors)Feb 21 08:26:10 Tower kernel: md: parity incorrect: 2867886024 (Errors) Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors) Feb 21 08:26:10 Tower kernel: handle_stripe read error: 2867886032/2, count: 1 (Errors) Feb 21 08:26:10 Tower kernel: md: parity incorrect: 2867886032 (Errors) Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors)Feb 21 08:26:10 Tower kernel: handle_stripe read error: 2867886040/2, count: 1 (Errors) Feb 21 08:26:10 Tower kernel: md: parity incorrect: 2867886040 (Errors) Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors)Feb 21 08:26:10 Tower kernel: handle_stripe read error: 2867886048/2, count: 1 (Errors)Feb 21 08:26:10 Tower kernel: md: parity incorrect: 2867886048 (Errors)Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors) Feb 21 08:26:10 Tower kernel: handle_stripe read error: 2867886056/2, count: 1 (Errors) Feb 21 08:26:10 Tower kernel: md: parity incorrect: 2867886056 (Errors) Feb 21 08:26:10 Tower kernel: md: disk2 read error (Errors) Attached is the full syslog. What should I do next? Thanks in advance Roy syslog-2011-02-21.zip
February 21, 201115 yr The sata cable and power from the power suppy could be the cause or the parity drive itself could be bad. What you are shooting for is zero errors. I'd swap out the cable and see if that fixes it. Since it's a new build did you run the memory test? If not then run the memory test for 12 hours. Again, shooting for zero errors.
February 21, 201115 yr Author Yes, I ran memory test after I put the system together. Not for 12 hours. Just overnight and ther was no sign of memory trouble. Will swap out the cables after the the parity check without correction is completed in Unmenu. Thanks for looking into this.
February 21, 201115 yr also be sure no power cables are bundled with your sata cables i had similar errors once ... removed the straps from the bundles and swapped the cable and the problem was gone meanwhile using the old cable again and nothing wrong with the system so i had to deduct that it was the interference with the power cable that was causing the issue
February 21, 201115 yr The errors you show are un-correctable sector errors on disk2. You might want to just get a SMART report on it and see if there are any sectors pending re-allocation. Joe L.
February 21, 201115 yr Author Yes, Joe, that makes sense! My disk2 and disk3 may be the weakest link in the chain. They have consistently gave me sector errors. On last pre clear, disk2 had 1 sector pending allocation. I decided to pass it and add it to the array. Didn't expect it to give me problem so soon. For now, disk2 and disk3 contain no data but will definitely keep an close eye on these 2 disks. Parity and disk1 are my two best disks according to pre clear results. Joe, do you think I should RMA disk2 or continue to use the disk and simply monitor the situation closely? I prefer the latter provided the sector situation doesn't get out of hand. Thank you!
February 21, 201115 yr Yes, Joe, that makes sense! My disk2 and disk3 may be the weakest link in the chain. They have consistently gave me sector errors. On last pre clear, disk2 had 1 sector pending allocation. I decided to pass it and add it to the array. Didn't expect it to give me problem so soon. For now, disk2 and disk3 contain no data but will definitely keep an close eye on these 2 disks. Parity and disk1 are my two best disks according to pre clear results. Joe, do you think I should RMA disk2 or continue to use the disk and simply monitor the situation closely? I prefer the latter provided the sector situation doesn't get out of hand. Thank you! It might be that one sector that was pending re-allocation it could not read when it did the parity check. Only way to know is to get a new smart report on disk2. Modern drives have several thousand spare sectors. Unless you see a constant trickle of new un-readable sectors, you'll probably be just fine with that drive. Hopefully that one sector marked as un-readable will be re-allocated once it gets used by a file or directory. It might take a long time before it gets used though, so do not expect it to go away unless you fill the drive with data. Joe L.
February 21, 201115 yr Author Joe, the new SMART report looks good to my untrained eye. Here goes: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 155 144 021 Pre-fail Always - 9250 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 451 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4733 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 218 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 182 193 Load_Cycle_Count 0x0032 143 143 000 Old_age Always - 172917 194 Temperature_Celsius 0x0022 115 097 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 2 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 2 SMART Error Log Version: 1 No Errors Logged Do you concur? BTW, which part of the syslog lead you to conclude errors were due to unreadable sector? I should know what to look out for should disk2 and disk3 start acting up again in the future. Thank you once again.
February 22, 201115 yr Author I've just completed another parity check and there wasn't any error. Not sure what to make of it since the problematic disk is still in the array and part of parity check. Perhaps SMART has recognised the bad sector and re-allocated accordingly I know my files are good and can only assume the parity is good for rebuild should I have a disk failure. I'm not going to get stressed or lose sleep over this. Btw, I didn't do much to rectify the problem i.e. I didn't swap or unbundle any cables. All I did was check all connections are properly and securely seated. I'm confident the system was well put together including the cabling Thanks to all who came to my aid. Cheers,
Archived
This topic is now archived and is closed to further replies.