Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Reallocated Sectors on Cache Pool SSDs

Featured Replies

Recently I received a warning about some Reallocated Sectors on one of the SSDs in my cache pool.  The other day I was doing some heavy copying with the cache and received a similar warning on the other drive in the pool.  I've searched the forums and this seems to be either "watch it to see if it gets worse" or "critical fix it now" problem.  I went back and found some old diagnostics and two months ago both drives had Reallocated_Sector_Ct=0 in their SMART reports.  Now one drive has Reallocated_Sector_Ct=3 and Reallocated_Sector_Ct=7.

 

Both of these drives are 1TB Samsung Evo 870's that were purchased in February of 2021. I'm not excited about the prospect of swapping out the drives as they contain several critical VMs for my home business as well as Unraid's Cache.  That being said, I'm also not excited about both drives in my cache pool failing at the same time.

 

Assuming these warnings are something I should take care of I started the RMA process with Samsung.  I did get an RMA issued but they won't send out a replacement drive until I send the old one in for evaluation.  Before I start down that road though, I wanted to get some opinions on what my next steps should be.  Is this a warning that warrants replacing both drives?  If so, how should I go about doing it? I thought Samsung SSDs were generally well regarded, maybe I just got unlucky?

 

BTW - I'm happy to post the SMART reports but I'm wondering if there's any sensitive information I should remove?  Like should I remove the serial numbers in the report prior to posting?

16 hours ago, nukeman said:

The other day I was doing some heavy copying

 

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

 

16 hours ago, nukeman said:

I maybe I just got unlucky?

 

Not likely for 2 SSD got problem in same time, if you haven't reach writing endurance 600TB data.

 

https://www.storagereview.com/review/samsung-870-evo-ssd-review

 

image.png.24cf1a2d514fb8d519fa1d9eefa0c912.png

 

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

 

Edited by Vr2Io

  • Author
54 minutes ago, Vr2Io said:

 

How full ( % used ) ? Do you daily moving data to array to free-up space ? If SSD almost full and most were static file then you may wrongly use SSD, endurance will greatly decrease.

 

If you really concern sensitive information, you can mask the serial and then post both SSD SMART report first.

 

 

The drives have 671GB free so they're 33% full.  Here's the smart reports.

 

2947T SMART Report.txt 2900M SMART Report.txt

Edited by nukeman

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.

 

Got some info. as below, so 45 means writing 45TiB

image.png.91376a0ce49ef504a0da2de31e7f0afe.png

 

Some people also report failure two 870 in same time.

 

 

Edited by Vr2Io

  • Author
13 hours ago, Vr2Io said:

Both SSD Wear_Leveling_Count indicate 44 and 45, init value was 0, BTW I m not confirm 44 actual % meaning, but if according Total_LBAs_Written, it still under endurance spec.

 

Got some info. as below, so 45 means writing 45TiB

image.png.91376a0ce49ef504a0da2de31e7f0afe.png

 

Some people also report failure two 870 in same time.

 

 

Well, I'm going to need some help translating the Wear_Leveling_Count.  You're saying I've written 45TB to the drive?!?!?  I don't understand how that could happen.

 

Also, I'm not excited to read that report of other 870 EXOs failing...

  • Community Expert
17 minutes ago, nukeman said:

You're saying I've written 45TB to the drive?!?!?

 

Around that yes, you can easily calculate based on this attribute:

 

241 Total_LBAs_Written      -O--CK   099   099   000    -    101015376296

 

101015376296 x 512 (sector size) = 51 719 872 663 552 bytes, or 47.04 TiB

  • Author

Are there any reports of excessive writing to cache drives or cache pools?  My cache hosts 4-5 VMs and "normal" dockers (Radarr, Sonarr, etc).  Both of these drives were introduced into my system when I initially created the cache pool.  Prior to that I was just using one (smaller) ssd for cache and docker/vm hosting.  

  • Community Expert

That looks normal to me for 1 year, previous Unraid releases had a problem with excessive writes, IIRC one of my cache SSDs at some point was writing about 3TB/Day, most of the reason it has now close to 1PB total writes.

  • Author

ok, 45TBs sounded like a lot, but I guess I am moving large media files around frequently.  Currently I have my downloads folder using the cache, perhaps it would make sense to not do so.  I don't really care about fast writes from downloads.

Regardless, I'm going to RMA the drives, one at a time, to keep the server up as much as possible. 

 

Plan is to: 

  1. Remove one of the drives from the pool and return it
  2. When I get the replacement drive I'll put it into the pool and let BTRFS rebuild it
  3. Then I'll take the other old drive out of the pool and return it
  4. finally, I'll put the second replacement drive into the pool

Or, should I do this procedure instead?  This post says there's trouble rebuilding the pool in 6.9.2?  But then, there's a workaround?

  • Community Expert

Removing a drive from a pool then adding one later is fine, you just can't do a direct replacement, that's the broken part.

  • 2 weeks later...
  • Author

I sent one of the cache pool drives back to Samsung.  They sent a new(?) refurbished replacement.  I added the replacement drive to the pool and Unraid did a parity check and found no errors.  Is that all I need to do prior to RMA'ing the other, original, cache drive.  I've never done this procedure before and want to make sure I'm good to remove the other faulty drive.  Is "no balance found" relevant?

image.png.423ce6fbf44198d8a28490301e78ffb3.png

  • Community Expert
8 hours ago, nukeman said:

did a parity check and found no errors. 

Parity check is for the array, you can run a scrub on the pool.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.