DMA errors with WD20EARS


goakes

Recommended Posts

Hello,

 

I'm new to unRAID and Linux.  I got unRAID running fine with one WD20EARS drive (jumper on pins 7-8 per unRAID suggestion).  That worked fine and I don't get any errors if I load unRAID with just the 1st drive. 

 

Then I added a 2nd identical drive (also WD20EARS) for my parity drive.  It appears to work, but it takes unRAID nearly 5-10 minutes to load and it logs tons of errors as it loads.  I've tried swapping data cables, power cables, jumpers, SATA ports, etc.  but I keep getting the same errors on the 2nd drive.  From what I can tell, it is trying to initialize it at DMA/133, then it keeps dropping down to DMA/100, then DMA/33, where it eventually appears to load up. 

 

Is this a configuration issue or a bad drive?  Once it starts up it appears I can use it to read/write but this definitely doesn't look normal.  My syslog is attached.  I did not pre-clear the drives but one of them seems to work fine without doing this.  Do I need to pre-clear the drive?  Any other thoughts...? 

 

Thanks in advance!

syslog.txt

Link to comment

I went ahead and ran the pre-clear script on the drive (I didn't do that when I first added it to the array because it was a new drive).  It took 60 hours to run (that sounds like a really long time, even for a 2TB drive).  Now when I start back up it appears the errors are gone.  Not sure if I should still be concerned...maybe the first sector on the drive was bad or something like that and now it got marked as bad as part of the pre-clear script...?  I haven't tested to see if performance is acceptable yet but it sounds encouraging.

Link to comment

I've attached my SMART report...thanks for taking a look at it.  Let me know if anything looks horrendous.  From reading the 'Troublshooting' document in the unRAID forum, it sounds like my "reallocated sector count" value of 200 is probably not a good thing...?

 

---------------

smart.txt

Link to comment

You have zero re-allocated sectors.

You are confusing the initial "Normalized" value of that parameter with the number of re-allocated sectors shown in the far-right column.

 

For most parameters, the "raw" column on the far right is only meaningful to the manufacturer.  For Re-allocated and pending re-allocated sectors it is the actual count.    The 200 value you saw is the initial starting normalized value.  It will decrease down the the failure threshold of "0" if you eventually get enough re-allocated sectors.  Most large disks have several thousand spare sctors, but most of us will replace a drive if the number starts to increase.    A few detected initially is not bad as long as they do not continue to increase.  I suspect that all drives have un-readable sectors.  When delivered from the factory the ones they detected before shipping have already been re-allocated, and not shown in the initial count they show to us as zero.

 

The drive looks perfectly healthy.

 

Joe L.

Link to comment

Can't advise on exactly the issue you are facing here but would suggest running the WD DLG tests on the drive.  Would suggest just burning an Ultimate BoodCD, this contains the tool and I find it the easiest way to run.

 

I have just had to swap a DOA WD20EARS.  1st go of preclear got most of the way through but didn't complete citing an error updating the MBR.  Unfortunately it failed before running the 2nd SMART test so I had to run a long one manually.  No issues were reported on this test so I thought it could be the motherboard I had setup to do the preclear as it wasn't the usual one I used.  Swapped back to the regular board and tried preclearing again, speeds were around 40MB/s sometimes dropping down to 4MB/s so something definitely wasn't write.  After running the extended WD DLG test it reported there were too many errors with the drive so I RMAd it.

 

Long story short, SMART is useful but won't always report issues with a bad drive.

 

EDIT: forgot to mention, new drive is now preclearing, initial speed for first stage up to about 7% has been around 100MB/s for reference.

Link to comment

FWIW I've just had to return a WD20EARS too.

 

SMART was reporting the drive was fine but write speeds fluctuated badly, all the way down to 4MB/s at some points (read was still around 100MB/s). I ran the preclear script three on the drive and SMART eventually showed that lots of sectors were being remapped. After the preclear, a smart 'long' test failed so it went back to WD.

 

If you can, remove the drive from the array and preclear it a few times, it takes a while but it confirmed a drive problem for me after wondering if it was the port/cable/motherboard.

Link to comment

Sounds like a good idea....I'll re-run the preclear again.  I'm guessing it will run in less than 60 hours this time because when it booted up it dropped down to UDMA/33 due to the errors.  It's booting up at UDMA/133 now so hopefully it will finish quicker now.  Any idea how long your preclear took to run?

Link to comment

Because of the poor write speed I was experiencing, each run of the pre-clear took about 22hrs. I've installed the replacement now and it's going like the clappers - I expect it to take about 12 hours. Keep an eye on your syslog, as the dma could be downgraded even after boot.

 

After the hassle I had with the previous WD20EARS, I'll be running preclear quite a few times before I trust the drive.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.