Jump to content

Reallocated Sectors on Cache Pool SSDs


nukeman

Recommended Posts

Recently I received a warning about some Reallocated Sectors on one of the SSDs in my cache pool.  The other day I was doing some heavy copying with the cache and received a similar warning on the other drive in the pool.  I've searched the forums and this seems to be either "watch it to see if it gets worse" or "critical fix it now" problem.  I went back and found some old diagnostics and two months ago both drives had Reallocated_Sector_Ct=0 in their SMART reports.  Now one drive has Reallocated_Sector_Ct=3 and Reallocated_Sector_Ct=7.

 

Both of these drives are 1TB Samsung Evo 870's that were purchased in February of 2021. I'm not excited about the prospect of swapping out the drives as they contain several critical VMs for my home business as well as Unraid's Cache.  That being said, I'm also not excited about both drives in my cache pool failing at the same time.

 

Assuming these warnings are something I should take care of I started the RMA process with Samsung.  I did get an RMA issued but they won't send out a replacement drive until I send the old one in for evaluation.  Before I start down that road though, I wanted to get some opinions on what my next steps should be.  Is this a warning that warrants replacing both drives?  If so, how should I go about doing it? I thought Samsung SSDs were generally well regarded, maybe I just got unlucky?

 

BTW - I'm happy to post the SMART reports but I'm wondering if there's any sensitive information I should remove?  Like should I remove the serial numbers in the report prior to posting?

Link to comment
16 hours ago, nukeman said:

The other day I was doing some heavy copying

 

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

 

16 hours ago, nukeman said:

I maybe I just got unlucky?

 

Not likely for 2 SSD got problem in same time, if you haven't reach writing endurance 600TB data.

 

https://www.storagereview.com/review/samsung-870-evo-ssd-review

 

image.png.24cf1a2d514fb8d519fa1d9eefa0c912.png

 

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

 

Edited by Vr2Io
Link to comment
54 minutes ago, Vr2Io said:

 

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

 

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

 

 

The drives have 671GB free so they're 33% full.  Here's the smart reports.

 

2947T SMART Report.txt 2900M SMART Report.txt

Edited by nukeman
Link to comment

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.

 

Got some info. as below, so 45 means writing 45TiB

image.png.91376a0ce49ef504a0da2de31e7f0afe.png

 

Some people also report failure two 870 in same time.

 

 

Edited by Vr2Io
Link to comment
13 hours ago, Vr2Io said:

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.

 

Got some info. as below, so 45 means writing 45TiB

image.png.91376a0ce49ef504a0da2de31e7f0afe.png

 

Some people also report failure two 870 in same time.

 

 

Well, I'm going to need some help translating the Wear_Leveling_Count.  You're saying I've written 45TB to the drive?!?!?  I don't understand how that could happen.

 

Also, I'm not excited to read that report of other 870 EXOs failing...

Link to comment

Are there any reports of excessive writing to cache drives or cache pools?  My cache hosts 4-5 VMs and "normal" dockers (Radarr, Sonarr, etc).  Both of these drives were introduced into my system when I initially created the cache pool.  Prior to that I was just using one (smaller) ssd for cache and docker/vm hosting.  

Link to comment

ok, 45TBs sounded like a lot, but I guess I am moving large media files around frequently.  Currently I have my downloads folder using the cache, perhaps it would make sense to not do so.  I don't really care about fast writes from downloads.

Regardless, I'm going to RMA the drives, one at a time, to keep the server up as much as possible. 

 

Plan is to: 

  1. Remove one of the drives from the pool and return it
  2. When I get the replacement drive I'll put it into the pool and let BTRFS rebuild it
  3. Then I'll take the other old drive out of the pool and return it
  4. finally, I'll put the second replacement drive into the pool

Or, should I do this procedure instead?  This post says there's trouble rebuilding the pool in 6.9.2?  But then, there's a workaround?

Link to comment
  • 2 weeks later...

I sent one of the cache pool drives back to Samsung.  They sent a new(?) refurbished replacement.  I added the replacement drive to the pool and Unraid did a parity check and found no errors.  Is that all I need to do prior to RMA'ing the other, original, cache drive.  I've never done this procedure before and want to make sure I'm good to remove the other faulty drive.  Is "no balance found" relevant?

image.png.423ce6fbf44198d8a28490301e78ffb3.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...