Reallocated Sectors on Cache Pool SSDs

nukeman · February 1, 2022

Recently I received a warning about some Reallocated Sectors on one of the SSDs in my cache pool. The other day I was doing some heavy copying with the cache and received a similar warning on the other drive in the pool. I've searched the forums and this seems to be either "watch it to see if it gets worse" or "critical fix it now" problem. I went back and found some old diagnostics and two months ago both drives had Reallocated_Sector_Ct=0 in their SMART reports. Now one drive has Reallocated_Sector_Ct=3 and Reallocated_Sector_Ct=7.

Both of these drives are 1TB Samsung Evo 870's that were purchased in February of 2021. I'm not excited about the prospect of swapping out the drives as they contain several critical VMs for my home business as well as Unraid's Cache. That being said, I'm also not excited about both drives in my cache pool failing at the same time.

Assuming these warnings are something I should take care of I started the RMA process with Samsung. I did get an RMA issued but they won't send out a replacement drive until I send the old one in for evaluation. Before I start down that road though, I wanted to get some opinions on what my next steps should be. Is this a warning that warrants replacing both drives? If so, how should I go about doing it? I thought Samsung SSDs were generally well regarded, maybe I just got unlucky?

BTW - I'm happy to post the SMART reports but I'm wondering if there's any sensitive information I should remove? Like should I remove the serial numbers in the report prior to posting?

Vr2Io · February 1, 2022

16 hours ago, nukeman said:

The other day I was doing some heavy copying

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

16 hours ago, nukeman said:

I maybe I just got unlucky?

Not likely for 2 SSD got problem in same time, if you haven't reach writing endurance 600TB data.

https://www.storagereview.com/review/samsung-870-evo-ssd-review

image.png.24cf1a2d514fb8d519fa1d9eefa0c912.png

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

Edited February 1, 2022 by Vr2Io

nukeman · February 1, 2022

54 minutes ago, Vr2Io said:

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

The drives have 671GB free so they're 33% full. Here's the smart reports.

2947T SMART Report.txt 2900M SMART Report.txt

Edited February 1, 2022 by nukeman

Vr2Io · February 2, 2022

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, ~~BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.~~

Got some info. as below, so 45 means writing 45TiB

image.png.91376a0ce49ef504a0da2de31e7f0afe.png

Some people also report failure two 870 in same time.

Edited February 2, 2022 by Vr2Io

nukeman · February 2, 2022

13 hours ago, Vr2Io said:

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, ~~BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.~~

Got some info. as below, so 45 means writing 45TiB

Some people also report failure two 870 in same time.

Well, I'm going to need some help translating the Wear_Leveling_Count. You're saying I've written 45TB to the drive?!?!? I don't understand how that could happen.

Also, I'm not excited to read that report of other 870 EXOs failing...

JorgeB · February 2, 2022

17 minutes ago, nukeman said:

You're saying I've written 45TB to the drive?!?!?

Around that yes, you can easily calculate based on this attribute:

241 Total_LBAs_Written      -O--CK   099   099   000    -    101015376296

101015376296 x 512 (sector size) = 51 719 872 663 552 bytes, or 47.04 TiB

nukeman · February 2, 2022

Are there any reports of excessive writing to cache drives or cache pools? My cache hosts 4-5 VMs and "normal" dockers (Radarr, Sonarr, etc). Both of these drives were introduced into my system when I initially created the cache pool. Prior to that I was just using one (smaller) ssd for cache and docker/vm hosting.

JorgeB · February 2, 2022

That looks normal to me for 1 year, previous Unraid releases had a problem with excessive writes, IIRC one of my cache SSDs at some point was writing about 3TB/Day, most of the reason it has now close to 1PB total writes.

nukeman · February 2, 2022

ok, 45TBs sounded like a lot, but I guess I am moving large media files around frequently. Currently I have my downloads folder using the cache, perhaps it would make sense to not do so. I don't really care about fast writes from downloads.

Regardless, I'm going to RMA the drives, one at a time, to keep the server up as much as possible.

Plan is to:

Remove one of the drives from the pool and return it
When I get the replacement drive I'll put it into the pool and let BTRFS rebuild it
Then I'll take the other old drive out of the pool and return it
finally, I'll put the second replacement drive into the pool

Or, should I do this procedure instead? This post says there's trouble rebuilding the pool in 6.9.2? But then, there's a workaround?

JorgeB · February 2, 2022

Removing a drive from a pool then adding one later is fine, you just can't do a direct replacement, that's the broken part.

nukeman · February 16, 2022

I sent one of the cache pool drives back to Samsung. They sent a new(?) refurbished replacement. I added the replacement drive to the pool and Unraid did a parity check and found no errors. Is that all I need to do prior to RMA'ing the other, original, cache drive. I've never done this procedure before and want to make sure I'm good to remove the other faulty drive. Is "no balance found" relevant?

image.png.423ce6fbf44198d8a28490301e78ffb3.png

JorgeB · February 17, 2022

8 hours ago, nukeman said:

did a parity check and found no errors.

Parity check is for the array, you can run a scrub on the pool.

Reallocated Sectors on Cache Pool SSDs

Recommended Posts

nukeman

Link to comment

Vr2Io

Link to comment

nukeman

Link to comment

Vr2Io

Link to comment

nukeman

Link to comment

JorgeB

Link to comment

nukeman

Link to comment

JorgeB

Link to comment

nukeman

Link to comment

JorgeB

Link to comment

nukeman

Link to comment

JorgeB

Link to comment

Join the conversation