coreylane Posted February 12, 2023 Share Posted February 12, 2023 (edited) Posting this here in case anyone else runs into these issues, hopefully it will save some time. TLDR: Avoid using Crucial SSDs in your Unraid system. If you are using them, backup all the data immediately, consider replacing them, or at the very least check your firmware version and update to the latest (M3CR046) ASAP. I had a cache pool using 2x Crucial MX500 1TB SSDs. They worked fine for about a year, but this past week I suddenly started getting all kinds of BTRFS errors and other storage related write errors messages in the syslog. Examples below. The only thing that ended up resolving this and stabilizing my cache pool was updating the SSDs firmware to the latest version available, M3CR046 at the time of this post. This update is not available for direct download through the Crucial support site, you must use crucial storage executive software which only runs on Windows. Also the firmware update only works if you are actively writing to the disk (lol)... so this required mounting BTRFS in Windows using WinBtrfs, and writing to the filesystem while you execute the firmware update in the crucial software. I will never buy Crucial SSDs again, and am looking to replace these with a more reliable brand. Feb 7 01:20:52 darktower kernel: I/O error, dev loop2, sector 887200 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Feb 7 01:21:10 darktower kernel: BTRFS error (device loop2: state EA): bdev /dev/loop2 errs: wr 13, rd 1644, flush 0, corrupt 0, gen 0 Feb 7 01:21:10 darktower kernel: BTRFS warning (device sdc1: state EA): direct IO failed ino 109014 rw 0,0 sector 0x578abf30 len 0 err no 10 Feb 7 01:21:10 darktower kernel: BTRFS warning (device sdc1: state EA): direct IO failed ino 109014 rw 0,0 sector 0x578abf38 len 0 err no 10 Feb 7 04:40:04 darktower root: Fix Common Problems: Error: Unable to write to Docker Image Feb 7 08:39:38 darktower kernel: I/O error, dev sdc, sector 212606944 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Feb 7 08:39:38 darktower kernel: I/O error, dev loop3, sector 78080 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Edited February 12, 2023 by ceddybu 1 Quote Link to comment
trurl Posted February 12, 2023 Share Posted February 12, 2023 Another thread about those Quote Link to comment
Hoopster Posted February 12, 2023 Share Posted February 12, 2023 3 hours ago, ceddybu said: TLDR: Avoid using Crucial SSDs in your Unraid system I had a Crucial MX500 in my Unraid system for a a few years. It had the problem described in the post linked by Trurl; pending sector count going to 1 and then magically returning to 0. The solution that worked for me was to disable tracking of attribute 197 in the SSD SMART settings in Unraid. No firmware upgrade would address the issue and, in fact, Crucial started calling it "normal" when it started happening in WIndows as well as Linux. I have a couple of Crucial SSDs now in Windows machines (one is the former Unraid MX500) and have had no issues there. Quote Link to comment
coreylane Posted February 12, 2023 Author Share Posted February 12, 2023 Read the errors in the logs I posted, this isn't simply an annoying SMART attribute discrepancy, the BTRFS filesystem will become completely read-only, the drive will (temporarily) stop being detected in BIOS, and you will potentially lose data. The firmware release notes from Crucial admit this problem exists. They claim it doesn't affect Windows, which is why I specifically mention "Unraid system" in my original post. Quote New Version: M3CR046 Release Date: Dec-4-2022 Release Notes: This is an optional update which repairs a hang condition occurring under corner-case workloads. Most Windows desktop and notebook users will be unaffected by this change. Quote Link to comment
Hoopster Posted February 12, 2023 Share Posted February 12, 2023 5 minutes ago, ceddybu said: Read the errors in the logs I posted, this isn't simply an annoying SMART attribute discrepancy, I never said it was. I was just pointing out these SSDs do have issues in Linux but they have been OK for me in Windows. I wouldn't recommend the MX500 for Linux/Unraid either. Quote Link to comment
Decto Posted February 12, 2023 Share Posted February 12, 2023 Useful information and a data point but without a clear trend of failures it may be excessive to write off the MX500 so completely. Perhaps an issue with a specific firmware version that only showed up, as they say in an 'edge case'. My 500GB drive in the cache is over 2 years old, no issues apart from the nusiance alerts for 'pending sector' which I disabled. When I look at the smart data, no sectors or nand blocks have actually been reallocated etc. so just the way the drive reports rather than any indication of reliabilty or pending failure. The other (mirror) cache drive is a different brand to split the risk of any systemic failure. I'd alway recommend spitting the risk in a pool in such a way. My main array uses a deliberate mix of drive models and purchase dates. I have around 10 MX500's around the house (PC, Xbox, PS4, Set Top Box) as they are one of the SSD's that still has some DRAM and while some of these are up to 4 years old with 24/7 running , I'm yet to have an issue with any one of them. Also widely installed in (guessing 30+) PC's I've updated for friends and family over the last few years, again with no reported failures or issues. TBH I usually pick up a couple on the prime sales so I have drive or 2 on hand. 1 Quote Link to comment
JorgeB Posted February 12, 2023 Share Posted February 12, 2023 Thanks for the info, I've been using 8 MX500 SSDs with Unraid for several years, though no issues so far except for the known pending sector attribute, and one of them is at 0% life remaining for a while now. Quote Link to comment
coreylane Posted February 12, 2023 Author Share Posted February 12, 2023 2 hours ago, Decto said: Useful information and a data point but without a clear trend of failures it may be excessive to write off the MX500 so completely. Perhaps an issue with a specific firmware version that only showed up, as they say in an 'edge case'. Great idea about using two different make/model drives for RAID1 cache pool. And you are probably right about me catastrophizing, we need more data points. Crucial release notes are very opaque and do not provide any transparency or details around what the actual "edge case" is so customers have no idea if they are potentially affected. Their firmware update process is also a complete joke, and their support all around seems lacking. 🤷♂️ 1 Quote Link to comment
mh79 Posted March 11, 2023 Share Posted March 11, 2023 (edited) I am just learning of this issue with MX500 drives. Checked my drive and sure enough it has M3CR043 firmware. I have it formatted as an XFS cache drive. It only has appdata and system shares configured to PREFER. I have had it running for about 8 months with no issues (maybe because I'm using XFS?). So what would be the most painless way to update? Would I change my appdata and system shares to YES and then have mover move those shares to the array. Then powerdown Unraid, remove the drive and put it in a Windows system then format it to NTFS(since my drive is XFS, winbrtfs method doesn't seem like an option). Run Crucial Executive and update the firmware. Should I maybe do a backup of the drive prior to formatting to NTFS, that way I could just restore the XFS format on the drive when done? Not sure how a program like Macrium Reflect would work with an XFS drive backup. Edited March 11, 2023 by mh79 Quote Link to comment
philehidiot Posted April 1, 2023 Share Posted April 1, 2023 Just to add to this, I posted about an M500 throwing up SMART weirdness and was directed here. SMART readout below: I'm finding the reallocate NAND block count very odd. Also of note, this drive was pulled from my main PC after a series of weird errors where SATA drives weren't being recognised. I had a collection of drives from the past 20 years in there and so I didn't bother to troubleshoot and just bought a new NVMe SSD to consolidate. I then tried this drive (being the largest and newest that wasn't in use) as the cache drive in my Unraid server. The server is still in testing, but suffice to say this drive has a date with the hammer. Or I may subject it to a worse fate and practice my awful SMD rework skills on it... Quote Link to comment
andrut Posted April 25, 2023 Share Posted April 25, 2023 After update to M3CR046 version all of the problems I had with two MX500 dropping offline seem to be gone. I had to reboot and scrub drivers every 2-3 days, but now the server is working without any problem for last 27 days. 1 Quote Link to comment
snolly Posted May 27, 2023 Share Posted May 27, 2023 please see this post if you need further help and maybe you want to update the affected drives from within unraid itself 1 Quote Link to comment
ronmcmxci Posted June 14, 2023 Share Posted June 14, 2023 As an additional data point, my 1 month-old 4TB MX500 has been having this issue. Glad I found an explanation but I wish I found it before I bought the drive :). Quote Link to comment
Commander_Alpha Posted February 27 Share Posted February 27 Hi, does the Problem still exists? Got a brand new MX500 1TB with 46er Firmware, preclear fails after some seconds of writing. Ok, maybe broken product. Send it back and got a new one. Same behavior, preclear fails at beginning of writing. Also, 46er Firmware. Gave it a try under Windows. Formatted the drive with MBR NTFS, copied some big files. After some Minutes Drive disappear Downgraded to 045er firmware manually and then updated to 046. Currently, Windows is copying..... Let's see what precleat would say. BR Quote Link to comment
ChatNoir Posted February 27 Share Posted February 27 You are not supposed to preclear SSDs. The tool is only used for HDDs. Quote Link to comment
Commander_Alpha Posted February 28 Share Posted February 28 Yes, it's used for HDD mainly, but I thought it's a quick and easy way to generate some write load on the SSD. And the SSD is performing good after the Firmware downgrade and upgrade. No error in preclear. May they have different versions of the 046er? Or they use a wrong file in the factory Let's see how long it will last. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.