BTRFS Cache Drives Randomly goes "Read Only"

Followers

January 8, 20233 yr

I had a problem with my Samsung SSD Cache drive going read only. I was a bit suspicious as the drive is only 6 months old, but since its contents weren't super important I just coped them to the Hard Drives and disabled it for the time being as we needed to get the NAS up and running so we could get work done. After a few hours, the rest of the other SSD, which we use for our VMs, went read only. The only solution is to restart the NAS, and the last time I restarted the NAS the filesystem on my cache SSD disappeared, and I fear that this could happen to my other VM SSD as well. Doing a full backup of the NAS (while possible) will be quite difficult as there is 18TB of content on the NAS and we don't have the best upload speeds, nor do we have 18TB of storage on hand at the moment (we are working on setting up a 3-2-1 backup solution but the NAS is fairly new). Another thing that is strange is that when Unraid decides to go "read only" it won't let me read the files either, it spits out the same error as before. Attached are the system logs that could be relevant.

I'm also having some issues with the VMs once their up and running, however I'm not sure if it could be related. Whenever I get a Linux VM up and running I can't seem to install certain Python packages via PIP without them getting corrupted halfway though installation. Specifically it happens with larger libraries, like Pytorch, however I'm not sure if its related. I assume it could be due to my issues being filesystem related.

hypercloud-syslog-20230108-1556.zip hypercloud-diagnostics-20230108-1102.zip

Quote

January 9, 20233 yr

Community Expert

Jan  8 07:48:50 HyperCloud kernel: BTRFS error (device nvme1n1p1): block=8617803776 write time tree block corruption detected

This means data corruption detected during a write, usually due to bad RAM, start by running memtest

Quote

January 9, 20233 yr

Author

Okay there's something seriously wrong with my hardware.

I let it get to about 200k errors before I pulled the plug. I'm going to first attempt to remove all the RAM and run memtest one stick at a time, hopefully this is as simple as some lose RAM. If that all fails then we'll try a re-seat of the CPU. If that doesn't work, do I have any other options other than some form of RMA?

Quote

January 9, 20233 yr

Community Expert

Just now, chand1012 said:

let it get to about 200k errors

more than zero is too many

You don't even want to attempt to run any computer unless RAM is working perfectly. Everything goes through RAM, the OS and other executable code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM.

Quote

January 9, 20233 yr

Community Expert

Are you overclocking? Don't

Quote

January 9, 20233 yr

Author

Just now, trurl said:

more than zero is too many

I am aware of this, I just was hoping that I got lucky and that only one of my sticks was bad and the first one it tested happened to be the bad one, but obviously not now. I assume this is probably related to a bad seating of the CPU, however I'm going to try reseating the RAM first (and testing each stick individually) as that required less effort on my part initially.

2 minutes ago, trurl said:

Are you overclocking? Don't

The CPU is at stock frequency.

Quote

January 9, 20233 yr

Community Expert

40 minutes ago, chand1012 said:

stock frequency

Not always clear what that means. Did you check the link?

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

BTRFS Cache Drives Randomly goes "Read Only"

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)