Probable dead drive...how to proceed?

Stupot · August 19, 2023

Hello,

I built my first Unraid server nearly 2 months ago. I bought 3 8TB Ironwolf 7200RPM drives to go in. As I was learning as I went, I didn't know then that I probably should have pre-cleared the drives before using them.

I set my array up with a single parity drive. I have been using the system without issue until last night I started getting reallocated sector count warnings to the drive where I download my Linux ISO's. It's racked up 200+ sector counts so far.

I have asked about this on Reddit already and got a couple suggestions but I think the advice of you fine folk would be most welcome as well!

I tried unplugging the SATA data and power cables from the drives. When I rebooted back into Unraid I started moving a small amount of data to the affected drive to see whether that cleared the problem. It did not...reallocated sector count warnings started immediately. I shutdown and instead tried a different connector on the breakout cable...only this time when booting up into Unraid, the drive is now "disabled". I tried different permutations of connectors from the breakout cable and have even got that one drive now connected through the SATA ports on the motherboard rather than the HBA, and it shows as disabled. I thought this was saying the drive was proper-dead..but now I learn that this might just be Unraid throwing a hissy because it's tried write data, failed, and now that has triggered the drive disabling...and that a rebuild of the drive would be required?

I am unsure how to proceed. I have a new drive ordered that will arrive Monday. I could wait for that and simply swap it out - but advice on Reddit from 2 people is that it could have been a cable issue and not the drive failing...but I can't enable the drive to test a different cable now without, I am assuming, doing the entire rebuild of the drive which I believe you take the array offline, remove that drive from the array, restart the array then stop it again, then readd that device and then it will rebuild?

https://pastebin.com/4Rw7pYSM

That is a SMART report for the drive. I'm not sure how to decode it. As you can see there are 200+ reallocated sectors. I don't know whether a quick build up in a short space of time of bad sectors is indicative of a bad drive or something else?

Advice most welcomed! Thank you

itimpi · August 19, 2023

In my experience if you start getting the reallocated sector count regularity increasing it is normally a sign that total failure of the drive is imminent and it should be replaced.

Stupot · August 19, 2023

2 hours ago, itimpi said:

In my experience if you start getting the reallocated sector count regularity increasing it is normally a sign that total failure of the drive is imminent and it should be replaced.

I've only been using the drives for around 6-8 weeks. Everything was fine until yesterday when the warnings started and it quickly rose to over 200. I do not know enough about hard drives to understand the mechanics of hard drive failings. Like...is it a common occurrence for 200+ "reallocated sector count" to happen in a short space of time? Or is that more indicative of an interface problem...i.e. a cable?

Regardless, I cannot say that even if I got the drive back working again that I would trust it. I think sorting out an RMA is the only action. It's a shame that the consensus seems to be that I will be sent a refurbished/recertified drive in return - even though i bought the drive less than 2 months ago.

Maybe I can contact the place I bought it from and ask for a refund instead. I'm not sure what the laws are these days since the UK left the EU.

itimpi · August 20, 2023

Reallocated sectors are not something that is normally caused by something like cabling. On a new drive then I would suspect that either there was a manufacturing defect or that the drive got damaged in transit.

Stupot · August 20, 2023

15 hours ago, itimpi said:

Reallocated sectors are not something that is normally caused by something like cabling. On a new drive then I would suspect that either there was a manufacturing defect or that the drive got damaged in transit.

Thank you.

New drive arrived today so I've got it doing a single pre-clear round now and then I'll do a rebuild.

I tried rebuilding last night using the defective drive but on the motherboard SATA and a new SATA cable, just to be sure, but errors started 4hrs in to the task so ye, sounds like the drive is dead. Have opened a ticket with the retailer to see what can be done to get it replaced.

Based on what the internet seems to suggest, the replacement will be a recert/refurb drive - which fine, it'll probably be up to standard...but it just feels like a bit of a con to give someone a second-hand product to replace a less-than-2-month-old product. 1st world problems I know.

Stupot · August 22, 2023

Hello,

I have a follow up question if anyone has any advice.

I am sending the defective drive back to the retailer for them to replace for me (with a new drive by the sounds of it, thankfully). In the few weeks I've had the drives, I've used them for storing my files and there is some sensitive data on the drive that I would like to remove before sending.

Can anyone offer the best solution? Is there a way to only wipe the sectors of the drive that have actual data on them, rather than the entire drive? I ask this as I believe when I start trying to write zeros or garbled-secure-erase-type data to the empty space, the reallocated sector count is likely to start increasing again. I am assuming of course that the data that I had written to the drive was placed in sectors that were functional up until the moment I hit the sectors that are defective...but I could be assuming correctly and writing anything to any part of the drive might cause more warnings.

Regardless, I do need to at least attempt to remove the data.

Thank you.

JonathanM · August 22, 2023

Using dd to send zeros to the drive device id is probably the best option, since it will start at the beginning and work up. You can always cancel the operation after it's been running long enough to write the first parts of the drive. Be very careful, sending zeros to the wrong device will erase it with very little chance of good recovery. Something like this dd bs=1M if=/dev/zero of=/dev/sdX status=progress

Stupot · August 25, 2023

On 8/22/2023 at 6:03 PM, JonathanM said:

Using dd to send zeros to the drive device id is probably the best option, since it will start at the beginning and work up. You can always cancel the operation after it's been running long enough to write the first parts of the drive. Be very careful, sending zeros to the wrong device will erase it with very little chance of good recovery. Something like this dd bs=1M if=/dev/zero of=/dev/sdX status=progress

Thanks

Just so I am crystal clear -- the command above to zero the drive below would be:

dd bs=1M if=/dev/zero of=/dev/sde status=progress

Correct? Thank you!

image.png.fe30e7cdd5ad25214b83348f4258e137.png

JonathanM · August 25, 2023

1 minute ago, Stupot said:

Correct?

Yes, assuming the target disk is still sde when you issue the command.

CHECK IMMEDIATELY BEFORE YOU ISSUE THE COMMAND

sd? designations can and will change, Unraid assigns the letters from scratch at each boot, so hardware changes can change which drive gets which letter. Do not assume just because the drive was sde last time the tower was started, it will be the same now. The consequences of getting it wrong are catastrophic. Double and triple check that you are sending the correct sd? command.

I can't stress this enough.

Stupot · August 26, 2023

On 8/25/2023 at 1:52 AM, JonathanM said:

Yes, assuming the target disk is still sde when you issue the command.

CHECK IMMEDIATELY BEFORE YOU ISSUE THE COMMAND

sd? designations can and will change, Unraid assigns the letters from scratch at each boot, so hardware changes can change which drive gets which letter. Do not assume just because the drive was sde last time the tower was started, it will be the same now. The consequences of getting it wrong are catastrophic. Double and triple check that you are sending the correct sd? command.

I can't stress this enough.

Thank you. I truly appreciate the time you took to explain that.

In the end though, I have ended up just using the UD Preclear plugin to zero the drive. The command did work and it began zeroing, and I could see the status of what it was doing, but it seems it requires the terminal window to be open for the entire duration. I guess if I SSHd in from another computer using Putty or something alike it would be more stable than doing it through the browser terminal. I am sure there is a Bette lr way of doing this that I am just not experienced and smart enough to figure.

It does appear that the Preclear plugin works out to in like I think you said DD does. It starts fast and gets progressively slower as it works it's way inwards on the platter. My hope is that Unraid writes data this way when the disks are used in the array. If that is the case, when I start hitting reallocated sector count warnings should mean the data that was on there has been zeroed and I can stop the process.

Thank you once again.

Probable dead drive...how to proceed?

Recommended Posts

Stupot

Link to comment

itimpi

Link to comment

Stupot

Link to comment

itimpi

Link to comment

Stupot

Link to comment

Stupot

Link to comment

JonathanM

Link to comment

Stupot

Link to comment

JonathanM

Link to comment

Stupot

Link to comment

Join the conversation