Cache Drive Forced Read-Only, would like second opinion


Go to solution Solved by itimpi,

Recommended Posts

For the past week, i've had my cache drive forced to Read-Only mode about 3 times. Looking at the logs I'm wondering if it's an issue with my RAM as I see 
 

Quote

BTRFS error (device loop2: state EA): parent transid verify failed on logical 278478848 mirror 1 wanted 1567253 found 1556606

 

But I can't be quite sure as that appears after the cache drive is forced into Read-Only mode. I thought maybe the SSD had failed but the SMART tests are showing accurate data. The time inbetween being forced into Cache mode also varies, it mostly appears to be happening overnight.

alexandria-diagnostics-20240325-1729.zip

Link to comment
28 minutes ago, Varean said:

performed a memtest but I received a Pass.

 

The 'problem' with memtest is that it is only definitive if you get a failure.   If you continue to get new failures that look like they could be RAM related it can be worth running with less sticks of RAM to lighten the load on the memory controller.

Link to comment
5 minutes ago, itimpi said:

 

The 'problem' with memtest is that it is only definitive if you get a failure.   If you continue to get new failures that look like they could be RAM related it can be worth running with less sticks of RAM to lighten the load on the memory controller.


Understood, and to provide additional context this system has been running for coming up to two years here in the next two months, I would presume that if there would be an issue with the memory controller then it would have reared its ugly head sooner rather than later. Part of me thinks even though the Cache drive passed it's SMART test it might be related? I know it all started when my CA Backup/Restore Appdata plugin tried to run overnight and I woke up to a bunch of errors related to the cache drive being full, and then about a week or two later these issues started popping up.

Link to comment
  • Solution
4 minutes ago, Varean said:


Understood, and to provide additional context this system has been running for coming up to two years here in the next two months, I would presume that if there would be an issue with the memory controller then it would have reared its ugly head sooner rather than later. Part of me thinks even though the Cache drive passed it's SMART test it might be related? I know it all started when my CA Backup/Restore Appdata plugin tried to run overnight and I woke up to a bunch of errors related to the cache drive being full, and then about a week or two later these issues started popping up.

 

 

With these types of problems it can be difficult to pin down the culprit. 😖   You might want to consider backing up the cache drive; reformatting it to get a clean file system; and then restoring its contents.   You can also get docker containers starting to play up after they have had an upgrade.

 

I mentioned RAM again because it is definitely possible for motherboard components to gradually degrade over time so that a system that has been stable for ages suddenly starts getting unpredictable errors.

Link to comment
11 minutes ago, itimpi said:

 

 

With these types of problems it can be difficult to pin down the culprit. 😖   You might want to consider backing up the cache drive; reformatting it to get a clean file system; and then restoring its contents.   You can also get docker containers starting to play up after they have had an upgrade.

 

I mentioned RAM again because it is definitely possible for motherboard components to gradually degrade over time so that a system that has been stable for ages suddenly starts getting unpredictable errors.


I agree, I've spent about a week trying to find out a solution on my own before posting. The hardware luckily was all brand new when I bought it (mobo is an Asus Prime Z690), I know the order of operations was

Backup of my AppData failed, and Plex told me I had a corrupted database > recreated my plex database > updated my Unraid version > had to perform a force update on each docker container as they wouldn't launch > after about a week and a half I started getting errors and unable to play back media in Plex. Tried restarting containers and received an execution error because my cache drive was in read only mode.

Link to comment
15 hours ago, itimpi said:

 

 

With these types of problems it can be difficult to pin down the culprit. 😖   You might want to consider backing up the cache drive; reformatting it to get a clean file system; and then restoring its contents.   You can also get docker containers starting to play up after they have had an upgrade.

 

I mentioned RAM again because it is definitely possible for motherboard components to gradually degrade over time so that a system that has been stable for ages suddenly starts getting unpredictable errors.

I'm going to try two different things today, worst case is they don't work. I've moved everything off of my cache drive and to my array, then reformatted it from btrfs to xfs. Now I am moving everything back to it that was on it. Even after formatting Unraid was still reporting it about about 3GB of usage on it? Not sure if that indicates an issue. 

Then once I get the chance this evening I'll power it down and pull one of the RAM sticks to see. It was suggested by a friend to re-seat the CPU but I don't think that really could be an issue. In the morning i'll see if the dockers remain functional or if the cache drive goes to read-only again.

Link to comment
  • 2 weeks later...
On 3/26/2024 at 7:13 PM, itimpi said:

 

 

With these types of problems it can be difficult to pin down the culprit. 😖   You might want to consider backing up the cache drive; reformatting it to get a clean file system; and then restoring its contents.   You can also get docker containers starting to play up after they have had an upgrade.

 

I mentioned RAM again because it is definitely possible for motherboard components to gradually degrade over time so that a system that has been stable for ages suddenly starts getting unpredictable errors.


I think all of these issues came about after the upgrade from 6.9.2 to 6.12.8, I did this because some of the plugins couldn't update until I had a higher version of Unraid, and also because my Plex database got corrupted after a failed appdata backup procedure. 

After reformatting my Cache drive so far it hasn't been forced into Read-Only mode, but I haven't had it running long enough before I started getting other errors which was fixed by JorgeB suggesting I switch my docker from macvlan to ipvlan. So far 3 1/2 days uptime so I am hopeful.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.