April 21, 201610 yr I have recently added the parity drive and after the initial parity check I have observed one of my 4 TB disks shows 9162 errors. What are those errors? Can I repair those errors? Do I still have the system protected by the parity drive? Thankyou Gus
April 21, 201610 yr Community Expert A correcting parity check can sometimes incorrectly update parity when there are disk errors, post diagnostics (tools > diagnostics)
April 21, 201610 yr Author A correcting parity check can sometimes incorrectly update parity when there are disk errors, post diagnostics (tools > diagnostics) Here it is. Thankyou Gus tower-diagnostics-20160421-1448.zip
April 21, 201610 yr Community Expert Serial Number: WD-WCC4E1170324 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 Disk2 has 10 pending. Normally I would say rebuild it but since you don't have reliable parity yet I would start by copying all its files to a drive not in the array, either to another computer on your network or to a drive mounted outside the array with Unassigned Devices.
April 21, 201610 yr Community Expert Disk 2 has pending sectors and should be replaced: Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4E1170324 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 Parity was however incorrectly updated, this is why I recommend always running non-correcting parity checks: Apr 21 02:08:31 Tower kernel: md: disk2 read error, sector=7468304696 Apr 21 02:08:31 Tower kernel: md: disk2 read error, sector=7468304704 Apr 21 02:08:31 Tower kernel: md: disk2 read error, sector=7468304712 Apr 21 02:08:31 Tower kernel: md: disk2 read error, sector=7468304720 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304728 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304736 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304744 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304752 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304760 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304768 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304776 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304784 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304792 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304800 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304808 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304816 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304824 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304832 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304840 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304848 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304856 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304864 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304872 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304880 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304888 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304896 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304904 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304912 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304920 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304928 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304936 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304944 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304952 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304960 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304968 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304976 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304984 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468304992 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305000 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305008 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305016 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305024 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305032 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305040 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305048 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305056 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305064 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305072 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305080 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305088 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305096 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305104 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305112 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305120 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305128 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305136 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305144 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305152 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305160 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305168 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305176 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305184 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305192 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305200 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305208 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305216 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305224 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305232 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305240 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305248 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305256 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305264 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305272 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305280 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305288 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305296 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305304 Apr 21 02:08:31 Tower kernel: md: correcting parity, sector=7468305312 This is what I would do: Replace disk2 and let it rebuild, then and if you have checksums for all files check which ones are corrupt and replace them. If you don't have checksums use a file compare utility to compare the rebuilt disk with the old one.
April 21, 201610 yr Author Sorry but I don't really know if I got it... Those disk errors represent a hardware disk sector failure? can I claim guarantee? Once I get out all the content from this disk is there any way to reuse this disk? If I remember well on my old synology the first time you put a disk, the system made a scaning of the disk to find if there where errors and mark those errors. Is the same as when we format a new drive in unraid? Replace disk2 and let it rebuild I'll use parity disk (for the momment) as the disk to copy the files and buy a new disk as parity. Basically the disk with errors contain some movies (no important content). When I copy those files to the new disk, the files with those errors will give me error when trying to copy? Once discarted the problematic files, will I be safe if I try to recreate parity? Thankyou Gus
April 21, 201610 yr Community Expert Pending sectors are usually bad sectors, if the disk is under warranty WD will replace it. You have more than one option to recover data, including: 1-use a spare disk to rebuild disk2, some files will be corrupt, if there are only videos they should still play and corruption can be almost unnoticeable, use old disk or checksums to find and replace corrupt files (this would be my preferred option). 2-remove disk from the array, do a new config with or without a new disk and copy all data that you can from the old disk disk, you probably won't be able to copy files from the affected sectors. 3-since you have space, move all the files you can from disk2 to other disk(s), then replace disk2 and do a parity sync instead of a rebuild.
April 21, 201610 yr Author One last thing. I have disabled parity disk to use it as a temporal backup (since I get a new replacement unit) of the drive with those bad sectors but the bad sectors have disappeared... Whats the explanation to this? Thankyou Gus
April 21, 201610 yr One last thing. I have disabled parity disk to use it as a temporal backup (since I get a new replacement unit) of the drive with those bad sectors but the bad sectors have disappeared... What makes you think the bad sectors are gone? The unraid webgui drive error counter is reset every time you stop the array, and is just incremented every time unraid is unable to read from the disk. You need to look at the smart report to view the bad sector counts.
April 21, 201610 yr Author What makes you think the bad sectors are gone? The unraid webgui drive error counter is reset every time you stop the array, and is just incremented every time unraid is unable to read from the disk. You need to look at the smart report to view the bad sector counts. OK, I got it. Whats the way to look at the smart report? Thankyou Gus
April 21, 201610 yr Whats the way to look at the smart report?Click on the device text and scroll down to the attributes section.
April 21, 201610 yr Author Here they are: Is nº1 the field to look at? # Attribute Name Flag Value Worst Threshold Type Updated Failed Raw Value 1 Raw read error rate 0x002f 200 200 051 Pre-fail Always Never 229 3 Spin up time 0x0027 193 175 021 Pre-fail Always Never 7341 4 Start stop count 0x0032 092 092 000 Old age Always Never 8712 5 Reallocated sector count 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek error rate 0x002e 100 253 000 Old age Always Never 0 9 Power on hours 0x0032 080 080 000 Old age Always Never 14838 (1y, 8m, 9d, 6h) 10 Spin retry count 0x0032 100 100 000 Old age Always Never 0 11 Calibration retry count 0x0032 100 100 000 Old age Always Never 0 12 Power cycle count 0x0032 100 100 000 Old age Always Never 154 192 Power-off retract count 0x0032 200 200 000 Old age Always Never 91 193 Load cycle count 0x0032 198 198 000 Old age Always Never 8655 194 Temperature celsius 0x0022 120 105 000 Old age Always Never 32 196 Reallocated event count 0x0032 200 200 000 Old age Always Never 0 197 Current pending sector 0x0032 200 200 000 Old age Always Never 10 198 Offline uncorrectable 0x0030 100 253 000 Old age Offline Never 0 199 UDMA CRC error count 0x0032 200 200 000 Old age Always Never 0 200 Multi zone error rate 0x0008 100 253 000 Old age Offline Never 0 Thankyou Gus
April 21, 201610 yr Community Expert Attribute 197 - Current pending sector is one of the most important and this disk's issue, it should always be 0.
April 21, 201610 yr Author Thankyou @johnnie.black I'm now using mc to copy the content of the disk with those bad sectors to a new drive. It's supposed that while I'm copying data the unraid webgui errors parameter will increment? Thankyou Gus
April 21, 201610 yr Community Expert When you try to copy a file on the affected sectors you should see the error counter increase and will probably get an error from mc that it can't copy that file. With some luck it only affects 1 or 2 files.
April 22, 201610 yr Author When you try to copy a file on the affected sectors you should see the error counter increase and will probably get an error from mc that it can't copy that file. With some luck it only affects 1 or 2 files. mc stopped the copy process (I ssh unraid), perhaps it's disconnected when the computer enters sleep mode? Now copying the content trough my windows vm... as you said the errors have appeared, by now 92 errors and 2 files with those read errors... 216min to go. Gus
April 22, 201610 yr mc stopped the copy process (I ssh unraid), perhaps it's disconnected when the computer enters sleep mode? Precisely. If you are remotely accessing the console (as opposed to typing on a keyboard attached to unraid) then you need to either make sure the session will not drop, or invoke the screen command (available in the nerdtools plugin) before starting any lengthy activity. The cool thing about screen is that you can detach from a session initiated from one location or method, and reattach to the same session from elsewhere. For example, you could start a screen session from the local keyboard, detach and leave it running, then reattach from a SSH session later. Just be cognizant of what you leave open, if you have a session open with an active prompt on an array drive, shutdown will fail until you close that session.
April 24, 201610 yr Author I'm in the 2/3 process of a preclear of the "faulty" drive with those unallocated sectors but when I have observed the smart again, I see that the "Current pending sector of 10" have disappeared and shows 0. Must I wait to finish de 3rd preclear to get a "reliable" result? Has the drive corrected those sectors? I have a RMA, what must I do now? (they will not see any pending sector at WD) Thankyou Gus
April 24, 201610 yr Personal opinion here, others may have different viewpoints. If the final smart stats after all the preclear cycles look good, I'd be tempted to keep the drive vs. getting an unknown refurb from the warranty process. Devil you know, etc. Post a smart report after all three cycles are done and you should get some better opinions on keep vs. trade. BTW, WD will warranty replace a perfectly good drive if you tell them you don't trust it. It's too much hassle for them to vet each RMA request, and they are just going to turn around and test your incoming drive, if it looks decent to them it will get a refurb lable, have the smart data reset, and sent out to the next customer (victim) with an RMA. You never know if the drive you get as a replacement has some funky issue that evaded their testing.
April 24, 201610 yr Or better yet, show the various smart reports to the vendor you bought the drive from and get a replacement from them. (one of the reason I never buy hard drives online -> so much easier to return them at a brick and mortar store if they are semi-questionable than online - well worth the extra $10 I spend) EDIT: Although at 14000+ power on hours, you pretty much have no choice but to RMA it if you want.
April 24, 201610 yr Community Expert Personal opinion here, others may have different viewpoints. If the final smart stats after all the preclear cycles look good, I'd be tempted to keep the drive vs. getting an unknown refurb from the warranty process. Devil you know, etc. Post a smart report after all three cycles are done and you should get some better opinions on keep vs. trade. BTW, WD will warranty replace a perfectly good drive if you tell them you don't trust it. It's too much hassle for them to vet each RMA request, and they are just going to turn around and test your incoming drive, if it looks decent to them it will get a refurb lable, have the smart data reset, and sent out to the next customer (victim) with an RMA. You never know if the drive you get as a replacement has some funky issue that evaded their testing. +1 In my experience refurbished disks have at best a 50/50 chance of lasting more than a couple of months, post a SMART report when the preclear finishes.
April 24, 201610 yr Author In this case, the vendor ( a big online vendor ) will give me the money back to buy a new one. Gus
Archived
This topic is now archived and is closed to further replies.