RockDawg Posted August 1, 2020 Share Posted August 1, 2020 I have a Crucial MX500 250GB SSD that I've been using as my cache drive. I installed it about a year ago and unRAID is reporting that is failing. SMART data being reported: Power on hours - 8646 Total LBAs Written - 158.34 TB Percent Lifetime Remain - 99 (Failing Now) I do not use the cache drive for writing file to the array (no mover). I simply use it to run Dockers. I do run quite a few of them and I have tried to troubleshoot the particular container causing the issue but I just can't seem to to find it. The containers I'm running are: Krusader Bitwarden Clouflare-DDNS Deluge DiskSpeed Emby LetsEncrypt MariaDb Netdata Nextcloud NzbGet Ombi Radarr Roon Sonarr I have tried stopping them all and starting each one by itself to see if one is the main culprit but they all perform writes at least every minute or so (or more often). And it just seems as thought all them running at the same time just adds up. All of the media ay of my containers downloads goes straight to another drive. But there has to be a way to keep them from killing an SDD in about a year, right? My cache is a single drive (XFS) and I am running 6.8.3 Any ideas. Quote Link to comment
JorgeB Posted August 2, 2020 Share Posted August 2, 2020 15 hours ago, RockDawg said: My cache is a single drive (XFS) and I am running 6.8.3 Any ideas. New beta fixes this issue, but this was mostly when using btrfs, how many GBs is it writing per day? Quote Link to comment
RockDawg Posted August 2, 2020 Author Share Posted August 2, 2020 Since my OP yesterday it's written ~150GB. Quote Link to comment
JorgeB Posted August 3, 2020 Share Posted August 3, 2020 And you're sure it's xfs? Either way you can use iotop (install de nerdpack plugin) to and then try to find out which docker(s) is writing so much. Quote Link to comment
RockDawg Posted August 5, 2020 Author Share Posted August 5, 2020 I am a bit confused now. I have the DiskSpeed docker and that's where I got the SMART data in my OP. Right now it says Total LBAs Written - 158.90 TB. But if I look at the SMART data in unraid it shows Total LBAs Written - 38794827600. Each LBA is 512 bytes, right? So 38,794,827,600 x 512 = 19,862,951,731,200 or 18.07 TB, right? Or am I calculating wrong? So where is DiskSpeed getting it's 158.90 TB? I read a review that said the 256 GB drive that I have is rated for 100 TBW. So my calculation is way under that and DiskSpeed is reporting over 50% than that rating. Quote Link to comment
JorgeB Posted August 5, 2020 Share Posted August 5, 2020 Diskspeed had a bug that was multiplying the LBAs written by 4k instead of 512b, IIRC this has been fixed in the latest update. 1 Quote Link to comment
RockDawg Posted August 5, 2020 Author Share Posted August 5, 2020 Indeed! Updated and now it reports 19.88TB. So now why would the SMART data be saying that the drive is failing due to percent lifetime remain? Quote Link to comment
itimpi Posted August 6, 2020 Share Posted August 6, 2020 What does the SMART information say when read from within Unraid directly (obtained by clicking on the drive on the Main tab)? If you post your system diagnostics zip file (obtained via Tools -> Diagnostics) then we could see for ourselves exactly what is being reported as the SMART information for all drives is part of the content of that zip. in principle the SMART data is handled by the firmware built into the drive so if it says the lifetime is running out it is probably true. Sounds as if the drive may not have the TBW value you though it had Quote Link to comment
JorgeB Posted August 6, 2020 Share Posted August 6, 2020 On 8/1/2020 at 4:42 PM, RockDawg said: Percent Lifetime Remain - 99 (Failing Now) Is this the the normalized value or the raw value? Please post the SMART report. Quote Link to comment
Mihle Posted August 6, 2020 Share Posted August 6, 2020 Does it actually say failing? Or just 99? Usually smart stat about it failing starts at 100 and then goes down to 0 the worse shape it is. Quote Link to comment
RockDawg Posted August 6, 2020 Author Share Posted August 6, 2020 Here are 2 screenshots showing the warning notification from the main page and the other is the SMART data shown when click on "cache" in the main page. It's now up to 100. Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 Looks like a firmware bug to me, 20TBW is nowhere close the expected life for that SSD, as a comparison where's one from an MX500 500GB, with around 90TB written it's at 50% expected life: === START OF INFORMATION SECTION === Model Family: Crucial/Micron MX500 SSDs Device Model: CT500MX500SSD1 Serial Number: 1849E1DAED7F ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 100 100 000 - 0 5 Reallocate_NAND_Blk_Cnt -O--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 100 100 000 - 11725 12 Power_Cycle_Count -O--CK 100 100 000 - 38 171 Program_Fail_Count -O--CK 100 100 000 - 0 172 Erase_Fail_Count -O--CK 100 100 000 - 0 173 Ave_Block-Erase_Count -O--CK 050 050 000 - 762 174 Unexpect_Power_Loss_Ct -O--CK 100 100 000 - 3 180 Unused_Reserve_NAND_Blk PO--CK 000 000 000 - 38 183 SATA_Interfac_Downshift -O--CK 100 100 000 - 0 184 Error_Correction_Count -O--CK 100 100 000 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 194 Temperature_Celsius -O---K 064 038 000 - 36 (Min/Max 0/62) 196 Reallocated_Event_Count -O--CK 100 100 000 - 0 197 Bogus_Current_Pend_Sect -O--CK 100 100 000 - 0 198 Offline_Uncorrectable ----CK 100 100 000 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 000 - 0 202 Percent_Lifetime_Remain ----CK 050 050 001 - 50 206 Write_Error_Rate -OSR-- 100 100 000 - 0 210 Success_RAIN_Recov_Cnt -O--CK 100 100 000 - 0 246 Total_LBAs_Written -O--CK 100 100 000 - 174404886673 247 Host_Program_Page_Count -O--CK 100 100 000 - 3003552877 248 FTL_Program_Page_Count -O--CK 100 100 000 - 4939020470 I would just ignore that. Quote Link to comment
itimpi Posted August 7, 2020 Share Posted August 7, 2020 It is also very strange that it says there is 100% life remaining and that it has also been flagged as FAILING NOW. Quote Link to comment
JorgeB Posted August 7, 2020 Share Posted August 7, 2020 24 minutes ago, itimpi said: It is also very strange that it says there is 100% life remaining and that it has also been flagged as FAILING NOW. That part is OK, normalized value is remaining life, raw value is spent life, this is from an almost new MX 500: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 100 100 000 - 0 5 Reallocate_NAND_Blk_Cnt -O--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 100 100 000 - 267 12 Power_Cycle_Count -O--CK 100 100 000 - 4 171 Program_Fail_Count -O--CK 100 100 000 - 0 172 Erase_Fail_Count -O--CK 100 100 000 - 0 173 Ave_Block-Erase_Count -O--CK 100 100 000 - 4 174 Unexpect_Power_Loss_Ct -O--CK 100 100 000 - 0 180 Unused_Reserve_NAND_Blk PO--CK 000 000 000 - 28 183 SATA_Interfac_Downshift -O--CK 100 100 000 - 0 184 Error_Correction_Count -O--CK 100 100 000 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 194 Temperature_Celsius -O---K 068 053 000 - 32 (Min/Max 0/47) 196 Reallocated_Event_Count -O--CK 100 100 000 - 0 197 Bogus_Current_Pend_Sect -O--CK 100 100 000 - 0 198 Offline_Uncorrectable ----CK 100 100 000 - 0 199 UDMA_CRC_Error_Count -O--CK 100 100 000 - 0 202 Percent_Lifetime_Remain ----CK 100 100 001 - 0 206 Write_Error_Rate -OSR-- 100 100 000 - 0 210 Success_RAIN_Recov_Cnt -O--CK 100 100 000 - 0 246 Total_LBAs_Written -O--CK 100 100 000 - 2476107088 247 Host_Program_Page_Count -O--CK 100 100 000 - 19671852 248 FTL_Program_Page_Count -O--CK 100 100 000 - 33437715 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.