WARNING: Crucial MX500 SSDs world of pain, stay away from these


Recommended Posts

Posting this here in case anyone else runs into these issues, hopefully it will save some time.

 

TLDR: Avoid using Crucial SSDs in your Unraid system. If you are using them, backup all the data immediately, consider replacing them, or at the very least check your firmware version and update to the latest (M3CR046) ASAP.

 

I had a cache pool using 2x Crucial MX500 1TB SSDs. They worked fine for about a year, but this past week I suddenly started getting all kinds of BTRFS errors and other storage related write errors messages in the syslog. Examples below. 

 

The only thing that ended up resolving this and stabilizing my cache pool was updating the SSDs firmware to the latest version available, M3CR046 at the time of this post. This update is not available for direct download through the Crucial support site, you must use crucial storage executive software which only runs on Windows. Also the firmware update only works if you are actively writing to the disk (lol)... so this required mounting BTRFS in Windows using WinBtrfs, and writing to the filesystem while you execute the firmware update in the crucial software. 

 

I will never buy Crucial SSDs again, and am looking to replace these with a more reliable brand.

 

Feb  7 01:20:52 darktower kernel: I/O error, dev loop2, sector 887200 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
Feb  7 01:21:10 darktower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 13, rd 1644, flush 0, corrupt 0, gen 0
Feb  7 01:21:10 darktower kernel: BTRFS warning (device sdc1: state EA): direct IO failed ino 109014 rw 0,0 sector 0x578abf30 len 0 err no 10
Feb  7 01:21:10 darktower kernel: BTRFS warning (device sdc1: state EA): direct IO failed ino 109014 rw 0,0 sector 0x578abf38 len 0 err no 10
Feb  7 04:40:04 darktower root: Fix Common Problems: Error: Unable to write to Docker Image
Feb  7 08:39:38 darktower kernel: I/O error, dev sdc, sector 212606944 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
Feb  7 08:39:38 darktower kernel: I/O error, dev loop3, sector 78080 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0

 

Edited by ceddybu
  • Upvote 1
Link to comment
3 hours ago, ceddybu said:

TLDR: Avoid using Crucial SSDs in your Unraid system

I had a Crucial MX500 in my Unraid system for a a few years.  It had the problem described in the post linked by Trurl; pending sector count going to 1 and then magically returning to 0.  The solution that worked for me was to disable tracking of attribute 197 in the SSD SMART settings in Unraid.  No firmware upgrade would address the issue and, in fact, Crucial started calling it "normal" when it started happening in WIndows as well as Linux.

 

I have a couple of Crucial SSDs now in Windows machines (one is the former Unraid MX500) and have had no issues there.

Link to comment

Read the errors in the logs I posted, this isn't simply an annoying SMART attribute discrepancy, the BTRFS filesystem will become completely read-only, the drive will (temporarily) stop being detected in BIOS, and you will potentially lose data. 

 

The firmware release notes from Crucial admit this problem exists. They claim it doesn't affect Windows, which is why I specifically mention "Unraid system" in my original post. 

Quote

 

New Version: M3CR046

Release Date: Dec-4-2022

Release Notes: This is an optional update which repairs a hang condition occurring under corner-case workloads. Most Windows desktop and notebook users will be unaffected by this change.

 

 

Link to comment
5 minutes ago, ceddybu said:

Read the errors in the logs I posted, this isn't simply an annoying SMART attribute discrepancy,

I never said it was. 

 

I was just pointing out these SSDs do have issues in Linux but they have been OK for me in Windows.  I wouldn't recommend the MX500 for Linux/Unraid either.

Link to comment

Useful information and a data point but without a clear trend of failures it may be excessive to write off the MX500 so completely.

Perhaps an issue with a specific firmware version that only showed up, as they say in an 'edge case'.

 

My 500GB drive in the cache is over 2 years old, no issues apart from the nusiance alerts for 'pending sector' which I disabled.

When I look at the smart data, no sectors or nand blocks have actually been reallocated etc. so just the way the drive reports rather than any indication of reliabilty or pending failure.

 

The other (mirror) cache drive is a different brand to split the risk of any systemic failure. I'd alway recommend spitting the risk in a pool in such a way. My main array uses a deliberate mix of drive models and purchase dates.

 

I have around 10 MX500's around the house (PC, Xbox, PS4, Set Top Box) as they are one of the SSD's that still has some DRAM and while some of these are up to 4 years old with 24/7 running , I'm yet to have an issue with any one of them. Also widely installed in (guessing 30+) PC's I've updated for friends and family over the last few years, again with no reported failures or issues. TBH I usually pick up a couple on the prime sales so I have drive or 2 on hand.

 

 

  • Like 1
Link to comment
2 hours ago, Decto said:

Useful information and a data point but without a clear trend of failures it may be excessive to write off the MX500 so completely.

Perhaps an issue with a specific firmware version that only showed up, as they say in an 'edge case'.

 

Great idea about using two different make/model drives for RAID1 cache pool. 

 

And you are probably right about me catastrophizing, we need more data points. Crucial release notes are very opaque and do not provide any transparency or details around what the actual "edge case" is so customers have no idea if they are potentially affected. Their firmware update process is also a complete joke, and their support all around seems lacking. 🤷‍♂️

  • Like 1
Link to comment
  • 4 weeks later...

I am just learning of this issue with MX500 drives. Checked my drive and sure enough it has M3CR043 firmware. I have it formatted as an XFS cache drive. It only has appdata and system shares configured to PREFER. I have had it running for about 8 months with no issues (maybe because I'm using XFS?).

 

So what would be the most painless way to update? Would I change my appdata and system shares to YES and then have mover move those shares to the array. Then powerdown Unraid, remove the drive and put it in a Windows system then format it to NTFS(since my drive is XFS, winbrtfs method doesn't seem like an option).  Run Crucial Executive and update the firmware.  Should I maybe do a backup of the drive prior to formatting to NTFS, that way I could just restore the XFS format on the drive when done? Not sure how a program like Macrium Reflect  would work with an XFS drive backup.

Edited by mh79
Link to comment
  • 3 weeks later...

Just to add to this, I posted about an M500 throwing up SMART weirdness and was directed here. SMART readout below:

 

image.thumb.png.408ed339d334ee08ba5dda4ed7f29462.png

 

I'm finding the reallocate NAND block count very odd.

Also of note, this drive was pulled from my main PC after a series of weird errors where SATA drives weren't being recognised. I had a collection of drives from the past 20 years in there and so I didn't bother to troubleshoot and just bought a new NVMe SSD to consolidate. I then tried this drive (being the largest and newest that wasn't in use) as the cache drive in my Unraid server.

 

The server is still in testing, but suffice to say this drive has a date with the hammer. Or I may subject it to a worse fate and practice my awful SMD rework skills on it...

Link to comment
  • 4 weeks later...
  • 1 month later...
  • 3 weeks later...
  • 8 months later...

Hi,

does the Problem still exists?

Got a brand new MX500 1TB with 46er Firmware, preclear fails after some seconds of writing.

Ok, maybe broken product. Send it back and got a new one. Same behavior, preclear fails at beginning of writing. Also, 46er Firmware.

 

Gave it a try under Windows. Formatted the drive with MBR NTFS, copied some big files. After some Minutes Drive disappear :D

 

Downgraded to 045er firmware manually and then updated to 046. Currently, Windows is copying.....

 

Let's see what precleat would say.

 

BR

 

 

Link to comment

Yes, it's used for HDD mainly, but I thought it's a quick and easy way to generate some write load on the SSD.

 

And the SSD is performing good after the Firmware downgrade and upgrade. No error in preclear. May they have different versions of the 046er? Or they use a wrong file in the factory :D

 

Let's see how long it will last.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.