Possibly failing cache drive? "Your drive is either completely full or mounted read-only" (it's not full)

s449 · March 7, 2023

This is the second time in a few days that I've hit this error. Fix Common Problems will alert me that there's errors. I'll get two: "Your drive is either completely full or mounted read-only" but my drives are not full and something about my Docker.img being full but it's not.

Both times my Docker service will fail and on my Docker tab I'll see:

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 712
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 898

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 712
Couldn't create socket: [111] Connection refused
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 967

I've attached the diagnostics after the 2nd time this error has happened.

Also both times I try to stop the array and it gets stuck on "Retry unmounting user share(s)..." and no amount of trying to umount myself or find and kill any processes fixes it. The only thing I can do to un-stuck it is run "reboot" in console which gets detected as an unclean shutdown.

My best guess is one of my cache drives is dying. One of them is an older one that has 161 TB written (347424289620 total lbas). The other is only around 30 TB.

apollo-diagnostics-20230307-0858.zip

itimpi · March 7, 2023

You have got this in your syslog:

Mar  6 18:40:40 Apollo kernel: BTRFS critical (device sdb1): corrupt leaf: block=4926130962432 slot=84 extent bytenr=4950498230272 len=49152 unknown inline ref type: 129

which indicates corruption the on 'cache' pool. The SMART information for that drive does not indicate an issue however.

s449 · March 7, 2023

22 minutes ago, itimpi said:
You have got this in your syslog:
Mar  6 18:40:40 Apollo kernel: BTRFS critical (device sdb1): corrupt leaf: block=4926130962432 slot=84 extent bytenr=4950498230272 len=49152 unknown inline ref type: 129
which indicates corruption the on 'cache' pool. The SMART information for that drive does not indicate an issue however.

I saw that, and yeah I ran short SMART tests with no error. Attributes all look fine except the excessive lbas written. I'm seeing a brand new replacement SSD would only be $65 so I'll probably just replace it anyway. But I am curious:

Can an SSD be dying and not report any SMART/Attribute errors?
Is it possible it's not dying and my btrfs pool just needs to be re-balanced or something?

But I'm also not convinced 161 TBW is enough to kill a drive when I'm reading on Samsung's site "600 TBW for 1 TB model" (My cache is two Samsung 860 EVO 1TB).

Edited March 7, 2023 by s449

Possibly failing cache drive? "Your drive is either completely full or mounted read-only" (it's not full)

Recommended Posts

s449

Link to comment

itimpi

Link to comment

s449

Link to comment

Join the conversation