Questionable/Dying Drive?


Recommended Posts

Running ver 4.6, WD 2TB drives, precleared with ver 1.3 successfully, jumpered.    I’m loading the HD with media so running no parity drive. Transfers being done over the network.  Usually transfer 200-400 GB at a time taking around 1 hr. Last transfer stopped midway with syslog error messages I did not understand.  I aborted transfer and restarted copying  uncopied files. Then ran short smart status, but don’t understand it.  Is the drive failing? How questionable would the data integrity be?  I can still access files on the drive over the network.  I have the parity drive precleared and waiting for assignment.  Should I assign and format that drive as a data disc, then cut/paste the files from the questionable drive?  Would really like to save all the work I have done so far!  I have another WD 2TB drive here to preclear for a parity when I’m done loading my media.  Once my data is safe any other tests I should run?  I know the discs are running cold, it’s cold here.

syslog-2011-02-27.zip

ShortSmartStatus0060_022711.txt

Link to comment

There are 486 pending sectors for reallocation.  This is not an overly huge deal but something that needs to be watched.

 

I HOPE you have not deleted any of the data from the original source you are transferring.  If you have not deleted any of the data I would run this drive through some more preclear cycles (PLEASE update your version of preclear first).  I would do 2 more at least, probably 3.  If the current pending sectors or reallocated sector counts keep going up then you need to RMA the drive.

Link to comment

There are 486 pending sectors for reallocation.  This is not an overly huge deal but something that needs to be watched.

 

I HOPE you have not deleted any of the data from the original source you are transferring.  If you have not deleted any of the data I would run this drive through some more preclear cycles (PLEASE update your version of preclear first).  I would do 2 more at least, probably 3.  If the current pending sectors or reallocated sector counts keep going up then you need to RMA the drive.

486 sectors pending re-allocation is completely un-acceptable when initially loading your disks with data.  It only takes one un-readable sector to make your file-system corrupt.  You might have 1 file affected, or 486, or all of them.

 

I would stop loading the drive.  If you had assigned a parity drive initially you might be able to recover what was not readable, but since you do not, I will repeat, I hope you did not delete the source files from your original disks.

 

Un-assign the disk with the sectors pending re-allocation.  If it is a new disk, RMA it if possible, put it through several more pre-clear cycles.  If you can get it through two cycles with no additional sectors be marked for re-allocation, then maybe it might last a few months in your array.  I would keep a close eye on it. 

 

Loading data with no parity disk assigned is faster, but is far more risky.  Are you using checksums to verify what you are transferring to the array? 

 

Joe L.

Link to comment

Thank you both for the quick reply.  I have the original DVD's that I rip to a HD then transfer to the tower. Unfortunately only have 381 GB out of 1.3 TB (on this drive) still on the ripping drive.  Opinions on assigning my precleared (soon to be) parity drive and doing a transfer before RMAing the questionable drive? Was only 400 GB away from completeing 6 TB of transfer and was then going to run parity.  No I do not do checksum on transfers.  I know, running naked is dangerous and not that much faster.

Link to comment

I agree with everything said above.

 

As long as you don't delete the original copies of your data, the risk of running without parity protection temporarily is pretty minimal.  You would still have to suffer two simultaneous drive failures (your source drive and your unRAID data drive) in order to lose data.  I think it is a good idea to make some copies of your data while you are waiting for the RMA on the bad drive.

 

Is your ripping computer Windows based?  If so, then I highly recommend you install Teracopy (the free version is fine) and turn on the 'verify' function.  This will run a full CRC checksum on all your files after the transfer completes, and it will tell you about any errors that it finds.  This is the easiest way to make sure that your data reached the server safely and without corruption.

Link to comment

I agree with everything said above.

 

As long as you don't delete the original copies of your data, the risk of running without parity protection temporarily is pretty minimal.  You would still have to suffer two simultaneous drive failures (your source drive and your unRAID data drive) in order to lose data.  I think it is a good idea to make some copies of your data while you are waiting for the RMA on the bad drive.

 

Is your ripping computer Windows based?  If so, then I highly recommend you install Teracopy (the free version is fine) and turn on the 'verify' function.  This will run a full CRC checksum on all your files after the transfer completes, and it will tell you about any errors that it finds.  This is the easiest way to make sure that your data reached the server safely and without corruption.

It helps, but perhaps not if the disk is failing.

 

I suspect that unless a lOT of data is being movd and the disk buffer cache not used, that anything read back to verify a CRC would only read from the disk buffer cache and NOT from the physical disk.  You could have un-readable sectors and not know it until you perform your first parity check.  It is for that reason I strongly suggest ALWAYS putting a parity disk in place first, unless you are going to perform a separate checksum pass... independent of the actual copy.

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

It is NOT the drive's cache I'm referring to, although it too is involved, but the 2 Gig or 4 Gig or 8 Gig of memory in the unRAID server which is used as disk buffer cache.  Unless the file is bigger than the cache used, it will never go to the disk.

 

You are correct though, if you are transferring a 4GIg ISO image and only have 2Gig or RAM, the checksum must be going to the disk.  For your family photos, I doubt it.

 

Joe L.

 

Link to comment

I agree with everything said above.

 

As long as you don't delete the original copies of your data, the risk of running without parity protection temporarily is pretty minimal.  You would still have to suffer two simultaneous drive failures (your source drive and your unRAID data drive) in order to lose data.  I think it is a good idea to make some copies of your data while you are waiting for the RMA on the bad drive.

 

Is your ripping computer Windows based?  If so, then I highly recommend you install Teracopy (the free version is fine) and turn on the 'verify' function.  This will run a full CRC checksum on all your files after the transfer completes, and it will tell you about any errors that it finds.  This is the easiest way to make sure that your data reached the server safely and without corruption.

 

Yes I am using a Windows machine and will try using Teracopy in the future. Joe L. I am not familiar with doing a checksum outside of a canned program. Referral to this process would be appreciated and may make a good Wiki.  At this point I am under the impression that this drive is bad or on its way there (as all drives in the world, just this one is closer).  I am considering placing the drive I have in the tower, waiting to become the parity drive, into a  data drive assignment to transfer all of the data from the questionable drive, in an attempt to save the data that has already been transferred from the windows machine.  I have a drive here not precleared yet that I was going to have as a hot drive, this I can make the new parity drive waiting for the RMA on the questionable drive. Running ver 4.6 so will jumper this drive and run 1.7 preclear,yes? Thanks again from yet another noob.

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

It is NOT the drive's cache I'm referring to, although it too is involved, but the 2 Gig or 4 Gig or 8 Gig of memory in the unRAID server which is used as disk buffer cache.   Unless the file is bigger than the cache used, it will never go to the disk.

 

You are correct though, if you are transferring a 4GIg ISO image and only have 2Gig or RAM, the checksum must be going to the disk.  For your family photos, I doubt it.

 

Joe L.

 

 

So if I'm transferring 5 BluRay ISOs with TeraCopy, it should be okay right?

 

I should go back and check my HDD.  I had something similar happen to me, no parity drive (yet), transferred 5 or so BluRay ISOs with TeraCopy.  The first ISO copied over, but TeraCopy wouldn't start copying the second ISO.  I'm not sure what happened, but I had to close TeraCopy and copy the rest of the ISOs over.

 

I should note that when I bought the drive, along with all my other components in my build, the UPS guy decided it would be better to drop my package behind my locked sidegate (6ft drop) instead of just leaving everything on the ground in front of my door.  Yes that's right, my Lian Li Q08 case, mobo, BD Drive, and HDD were dropped 6ft.  Seriously, what in the world are the UPS guys thinking?

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

It is NOT the drive's cache I'm referring to, although it too is involved, but the 2 Gig or 4 Gig or 8 Gig of memory in the unRAID server which is used as disk buffer cache.   Unless the file is bigger than the cache used, it will never go to the disk.

 

You are correct though, if you are transferring a 4GIg ISO image and only have 2Gig or RAM, the checksum must be going to the disk.  For your family photos, I doubt it.

 

Joe L.

 

 

I was under the impression that unRAID's RAM cache was flushed as soon as possible (once the drives were ready).  Is that not the case?

 

marlin: So the questionable disk has the only copy of certain files?  If that's the case then you have no choice but to attempt a transfer onto another disk.  Best of luck.

 

Also, why not use unRAID 4.7 and forget about the jumpers?  Just use preclear 1.6 or newer with it and it will read unRAID's disk alignment setting (still set to sector 63 by default, you have to change it to 4k aligned).

 

yelnatsch517: You have to turn on the 'verify' function in TeraCopy's settings.  It isn't on by default.

Link to comment

marlin: So the questionable disk has the only copy of certain files?  If that's the case then you have no choice but to attempt a transfer onto another disk.  Best of luck.

 

Also, why not use unRAID 4.7 and forget about the jumpers?  Just use preclear 1.6 or newer with it and it will read unRAID's disk alignment setting (still set to sector 63 by default, you have to change it to 4k aligned).

 

The questionable disc doesn't have the only copy of files, I could re-rip and then transfer again, a lot of files. Just trying to conserve my time usage.  Why 'best of luck', is this risky considering things are questionable with the drive anyway? Sticking with 4.6 only to try to keep all the drives consistent at least till I build parity, minimizing troubleshooting and then making 1 step to 5.0+ whenever that is stable.

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

It is NOT the drive's cache I'm referring to, although it too is involved, but the 2 Gig or 4 Gig or 8 Gig of memory in the unRAID server which is used as disk buffer cache.   Unless the file is bigger than the cache used, it will never go to the disk.

 

You are correct though, if you are transferring a 4GIg ISO image and only have 2Gig or RAM, the checksum must be going to the disk.  For your family photos, I doubt it.

 

Joe L.

 

 

I was under the impression that unRAID's RAM cache was flushed as soon as possible (once the drives were ready).  Is that not the case?

 

marlin: So the questionable disk has the only copy of certain files?  If that's the case then you have no choice but to attempt a transfer onto another disk.  Best of luck.

 

Also, why not use unRAID 4.7 and forget about the jumpers?  Just use preclear 1.6 or newer with it and it will read unRAID's disk alignment setting (still set to sector 63 by default, you have to change it to 4k aligned).

 

yelnatsch517: You have to turn on the 'verify' function in TeraCopy's settings.  It isn't on by default.

 

I did turn on Verify.  TeraCopy froze after the first file was copied.  I'm not sure why.  It just wouldn't continue copying the other files.  It was weird.  Hopefully I'll get my drives from Amazon soon so I can set up a parity drive.

Link to comment

Joe, good point, but I'm pretty sure that Teracopy's CRC check is separate from the copy.  The entire transfer finishes first, then Teracopy goes back through each file and runs a CRC check against the source file.  So perhaps the final 64mb or smaller files could still be in the drive's cache, but at least most of the files should be actually written to the disk by the time the CRC check is run.  Of course it matters how much and how large of files you are transferring - a folder full of small pictures could be at more risk than a folder full of HD movies.

It is NOT the drive's cache I'm referring to, although it too is involved, but the 2 Gig or 4 Gig or 8 Gig of memory in the unRAID server which is used as disk buffer cache.   Unless the file is bigger than the cache used, it will never go to the disk.

 

You are correct though, if you are transferring a 4GIg ISO image and only have 2Gig or RAM, the checksum must be going to the disk.  For your family photos, I doubt it.

 

Joe L.

 

 

I was under the impression that unRAID's RAM cache was flushed as soon as possible (once the drives were ready).  Is that not the case?

It is flushed to the disk as soon as possible, BUT it is still in the buffer cache.  It a request come to read the same disk blocks it is smart enough to read it from memory rather than to go to the disk if it is still in the buffer cache.

 

Try this.

 

Copy a small movie to one of your disks.  Something that will easily fit in your RAM cache.  (Let's say something about 500 to 700Meg)

 

Now, spin down all the disks.

 

Now, play the movie.  (or get an MD5 checksum of it)  I'll bet the disk does not even spin up.

 

Joe L.

 

marlin: So the questionable disk has the only copy of certain files?  If that's the case then you have no choice but to attempt a transfer onto another disk.  Best of luck.

 

Also, why not use unRAID 4.7 and forget about the jumpers?  Just use preclear 1.6 or newer with it and it will read unRAID's disk alignment setting (still set to sector 63 by default, you have to change it to 4k aligned).

 

yelnatsch517: You have to turn on the 'verify' function in TeraCopy's settings.  It isn't on by default.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.