Jump to content

Intel DC SSD Cache Drive showing as unmountable wrong/no filesystem after power failure


Go to solution Solved by JorgeB,

Recommended Posts

Good ol' electrical company shut power off to the street for a few minutes this morning and I'm just now getting home to assess the damage. I've had this machine have unclean shutdowns a couple times in the past and had no problems after booting back up after a parity check. This time, however the cache drive fails to mount saying "wrong/no file system". I have VMs and Docker data on that cache drive and really need to recover it if possible but this is all uncharted territory for me. I attached the diagnostics, fingers crossed there's something there that would indicate I can recover the data, format, and copy the data back. 

thinkserver-diagnostics-20230608-1816.zip

Link to comment
  • Solution
Jun  8 15:01:15 ThinkServer kernel: BTRFS error (device sdb1): parent transid verify failed on 238370816 wanted 773399 found 771340

This error is fatal, it means some writes were lost, it can happen if a storage device lies about flushing it's write cache, this is usually a drive (or controller) firmware problem.

 

Btrfs restore (option #2 here) is the best bet to try and recover some data for this, then the device will need to be formatted and the data restore.

  • Like 1
Link to comment

I unfortunately was unable to recover anything. I am fortunate that the VMs I can for the most part get up and running pretty quickly. Jellyfin is behaving oddly after a uninstall and reinstall though. 

I pulled the cache drive in question and checked its health and the Intel software reports at 95% health and reports the firmware is current. I will probably need to look into a better HBA card though. I'm currently using a MegaRAID card that is not in IT Mode, but keeping the stock firmware and making each drive connected to it as its own Raid 0 array. These pass through to unRAID fine, but I can't see the SMART data for anything connected to the card.

I did add a second cache drive. My understanding is that this will then serve as a RAID1 for redundancy, they're 1.8TB so that's way more than enough for cache. I'm currently trying to determine a UPS to use for the server moving forward so this won't occur again, but am I correct in assuming that IF something were to happen again with the drive, would having the second cache drive allow me to keep going? 

Link to comment
3 hours ago, Nocturnal4Life said:

I'm currently using a MegaRAID card that is not in IT Mode, but keeping the stock firmware and making each drive connected to it as its own Raid 0 array.

This is not recommended and might have caused the issue.

 

3 hours ago, Nocturnal4Life said:

but am I correct in assuming that IF something were to happen again with the drive, would having the second cache drive allow me to keep going? 

If the device fails it will help, it won't help for an issue like this one.

  • Like 1
Link to comment

So I would be better off then to swap to an IT mode HBA card? I was looking at moving to a card that supports more drives, a 16 device card. Currently maxed out my normal 8 device card and I'd like to expand, or at least have the option at some point. 

 

UnRAID goes by hard drive device id correct so I could change out cards and it wouldn't affect the array? 

 

I guess it makes sense that the cache drive failing would be protected but not a data error since it'd be written to both the drives. 

 

So currently I'm going to invest in a UPS for the server and a more appropriate HBA card. Cache drive already is mirrored Incase of the unlikely DC drive failing. I saw there are some ways to automate backing up the cache data to the hard drive array. Might look into that as well. 

Link to comment
1 hour ago, Nocturnal4Life said:

UnRAID goes by hard drive device id correct so I could change out cards and it wouldn't affect the array? 

This is true if both the old and new cards present the serial number information the same.    Not sure though if it is true when moving from the MegaRaid card to a HBA as the MegaRaid card may not be passing the drive through transparently (although if that happens it can be worked around).

  • Like 1
Link to comment
2 hours ago, Nocturnal4Life said:

So I would be better off then to swap to an IT mode HBA card?

Yep, but like mentioned the IDs will change, that part is very easy, just do a new config, but some RAID controllers also mess with the partition, and that is usually solvable but more complicated.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...