SSD Lifespan on dashboard and warranty info


Recommended Posts

I'm curious if it would be possible to store a MAX TBW for SSDs in the warranty information in the Identity drive info, then have a running comparison of what smartctl shows for nvme/ssds to show how close you are to reaching that maximum so someone would know to prepare for a replacement. You'll see after doing a smartctl -a /dev/nvme0n1 I have a "Data Units Written" of 9.67 TB. This unit has a MAX TBW of 1800. Now, this isn't my cache drive, this is my desktop. But if you're using an SSD as a cache drive, I'm sure you could see how the SSD would quickly deteriorate and fail.  My cache SSD on my server is currently at 169TBW with a maximum of  530TBW before failure. Having this SSD lifespan viewable from the dashboard would be very helpful. My SSD in my server is only 1year old, but its used heavily for an open source project.

 

 

jcfrosty@Zero ~ $ sudo smartctl -a /dev/nvme0n1
Password: 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-sabayon] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Sabrent Rocket 4.0 1TB
Serial Number:                      03F10797054463199045
Firmware Version:                   EGFM11.1
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 2220653435
Local Time is:                      Sat Apr 17 11:32:39 2021 CDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d):     Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    10.73W       -        -    0  0  0  0        0       0
 1 +     7.69W       -        -    1  1  1  1        0       0
 2 +     6.18W       -        -    2  2  2  2        0       0
 3 -   0.0490W       -        -    3  3  3  3     2000    2000
 4 -   0.0018W       -        -    4  4  4  4    25000   25000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    1%
Data Units Read:                    7,506,169 [3.84 TB]
Data Units Written:                 18,893,007 [9.67 TB]
Host Read Commands:                 56,347,067
Host Write Commands:                289,751,028
Controller Busy Time:               583
Power Cycles:                       118
Power On Hours:                     14,438
Unsafe Shutdowns:                   55
Media and Data Integrity Errors:    0
Error Information Log Entries:      271
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 63 entries)
No Errors Logged

 

 

Screenshot_20210417_113435.png

  • Like 1
Link to comment
15 hours ago, Darksurf said:

with a maximum of  530TBW before failure.

TBW is partially the expect life but mostly the limit for the device to be within warranty, it doesn't mean the SSD is going to fail when you reach that, for example the cache device for one of my servers has a TBW of 500TB and is currently at 847TB and still going strong, though it's a good idea to monitor that, on NVMe devices you just need to monitor this:

 

15 hours ago, Darksurf said:

Percentage Used: 1%

This is the estimated life used percentage.

  • Like 1
Link to comment

That's awesome! It would be nice if we could get a lifespan meter somewhere in the open (it seems my method may be inaccurate and yours would be better). I want to make sure my server uptime doesn't take a bad turn when I need to order an SSD and it takes a week to get here. I'd like some pre-emptive warning/monitoring so I can plan accordingly rather than have items live on a shelf for years.

 

Thanks for the correction! I'm learning something new everyday.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.