Cache drive suddenly Read Only. Causing many issues.

September 17, 20241 yr

I've admittedly been having a lot of trouble with my server ever since my flash drive died and I rebuilt a new flash, most things I've been able to get passed but it seems like about once a week some container or issue pops up needing a restart.

Today however the Fix Common Problems app is saying that my cache drive is read only, out of no where. This did actually happen once before after rebuilding my flash drive but all was resolved when I ran a scrub on the cache drive. This time though unraid doesn't seem to be allowing me to even attempt a scrub. When hitting the scrub button it just shows 'aborted' with no attempt.

I tried switching out the sata cable as I read that could be a problem despite not getting any UDMA errors, but that didn't help either. My logs show a handful of concerning looking errors but as I am fairly new to the whole server and networking world I don't know where to go with it.

Attached diagnostics as well as logs, some things I'm seeing are:

This error seems to be spamming quite often on startup.. Though it doesn't appear to be the cause of my suddenly read-only drive, maybe it's related.

Quote

Sep 17 10:47:00 ANJNAS nginx: 2024/09/17 10:47:00 [error] 8183#8183: *3471 limiting requests, excess: 20.987 by zone "authlimit", client: 192.168.1.23, server: , request: "GET /login HTTP/1.1", host: "192.168.1.9", referrer: "http://192.168.1.9/Main/Settings/Device?name=cache"

Here is where the drive seems to suddenly switch to read-only:

Quote

Sep 17 10:44:43 ANJNAS kernel: Call Trace:

Sep 17 10:44:43 ANJNAS kernel: <TASK>

Sep 17 10:44:43 ANJNAS kernel: ? __warn+0xab/0x122

Sep 17 10:44:43 ANJNAS kernel: ? report_bug+0x109/0x17e

Sep 17 10:44:43 ANJNAS kernel: ? __btrfs_free_extent+0x4cf/0xc02

Sep 17 10:44:43 ANJNAS kernel: ? handle_bug+0x41/0x6f

Sep 17 10:44:43 ANJNAS kernel: ? exc_invalid_op+0x13/0x60

Sep 17 10:44:43 ANJNAS kernel: ? asm_exc_invalid_op+0x16/0x20

Sep 17 10:44:43 ANJNAS kernel: ? __btrfs_free_extent+0x4cf/0xc02

Sep 17 10:44:43 ANJNAS kernel: ? _raw_read_trylock+0x36/0x5c

Sep 17 10:44:43 ANJNAS kernel: ? btrfs_merge_delayed_refs+0x66/0x16e

Sep 17 10:44:43 ANJNAS kernel: __btrfs_run_delayed_refs+0x698/0xbe2

Sep 17 10:44:43 ANJNAS kernel: btrfs_run_delayed_refs+0x65/0x146

Sep 17 10:44:43 ANJNAS kernel: ? start_transaction+0x1fe/0x44d

Sep 17 10:44:43 ANJNAS kernel: btrfs_commit_transaction+0x76/0xa79

Sep 17 10:44:43 ANJNAS kernel: ? start_transaction+0x3dd/0x44d

Sep 17 10:44:43 ANJNAS kernel: ? schedule_timeout+0x5a/0xd7

Sep 17 10:44:43 ANJNAS kernel: transaction_kthread+0x105/0x17b

Sep 17 10:44:43 ANJNAS kernel: ? btrfs_cleanup_transaction.isra.0+0x3cc/0x3cc

Sep 17 10:44:43 ANJNAS kernel: kthread+0xe4/0xef

Sep 17 10:44:43 ANJNAS kernel: ? kthread_complete_and_exit+0x1b/0x1b

Sep 17 10:44:43 ANJNAS kernel: ret_from_fork+0x1f/0x30

Sep 17 10:44:43 ANJNAS kernel: </TASK>

Sep 17 10:44:43 ANJNAS kernel: ---[ end trace 0000000000000000 ]---

Sep 17 10:44:43 ANJNAS kernel: BTRFS: error (device sdb1: state A) in __btrfs_free_extent:3072: errno=-2 No such entry

Sep 17 10:44:43 ANJNAS kernel: BTRFS info (device sdb1: state EA): forced readonly

Sep 17 10:44:43 ANJNAS kernel: BTRFS error (device sdb1: state EA): failed to run delayed ref for logical 208224256 num_bytes 16384 type 176 action 2 ref_mod 1: -2

Sep 17 10:44:43 ANJNAS kernel: BTRFS: error (device sdb1: state EA) in btrfs_run_delayed_refs:2149: errno=-2 No such entry

I don't know what to make of these logs but it shows where the drive is suddenly "forced readonly". Before that point if I were to run Fix Common Problems it wouldn't show the unable to write error but after that point it will.

I've allocated 30GB for Docker and only 17GB of that is being used, so the docker itself doesn't appear to be full. I'm debating rebuilding docker if it's recommended but am also a bit afraid of going that route before trying anything else as it feels a bit nuclear and could be a pain getting everything back up and running again.

I'm hoping someone can help point me in the right direction here, I'm at a loss. Having issues with most of my containers because of this.

Also, random bonus error I don't really care much about, but fix common problems also lists Write Cache as being disabled on disk1 even though when I check the disk or switch on the write cache with "hdparm -W 1 /dev/(diskID)" it still says it's disable in fix common problems. This I can just ignore though for now as I obviously have bigger fish to fry.

anjnas-diagnostics-20240917-1050.zip anjnas-syslog-20240917-1750.zip

Edited September 17, 20241 yr by ANJ_

Quote

September 17, 20241 yr

Community Expert
Solution

With this type of error with btrfs, I always recommend backing up the pool, then re-formatting.

Quote

September 17, 20241 yr

Author

Just now, JorgeB said:

With this type of error with btrfs, I always recommend backing up the pool, then re-formatting.

I've read some people also recommending switching to XFS? I think they said specifically if you only have 1 cache drive, which I do. Do you recommend this as well?

I'm also not sure of the best practice or best way to go about backing up the pool. I have the Appdata Backup plugin, but wouldn't I still need to rebuild each individual container, I guess through templates, and then run the Appdata Backup restore?

I just imagine I'm going to have to be doing a lot of reconfiguring which of course is a pain.

Quote

September 17, 20241 yr

Community Expert

With a single device, and if you don't care for checksums or snapshots, XFS can be a good option, it's generally more robust than btrfs, especially with marginal hardware.

Quote

September 17, 20241 yr

Author

2 minutes ago, JorgeB said:

With a single device, and if you don't care for checksums or snapshots, XFS can be a good option, it's generally more robust than btrfs, especially with marginal hardware.

Ok, noted, thank you.

How about the pool backup method, is what I previously mentioned the way to go about it or is there an easier solution I may be unaware of? If you don't mind some input.

Edited September 17, 20241 yr by ANJ_

Quote

September 18, 20241 yr

Author

Is there a prerequisite to changing the file system of the cache? The File system type is not selectable for me.

image.png.bcf69bf73f12f772060104686f17d51c.png

Quote

September 18, 20241 yr

Is the array stopped?

Quote

September 18, 20241 yr

Community Expert

You may need to click the "erase" button first for that pool, in the same page as where you select the filesystem.

Quote

September 18, 20241 yr

Author

10 hours ago, JorgeB said:

You may need to click the "erase" button first for that pool, in the same page as where you select the filesystem.

That was the ticket, thank you.

Quote

1

Cache drive suddenly Read Only. Causing many issues.

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)