[SOLVED] "read-only file system" after crash of cache pool

PzrrL · October 10, 2022

This seems to be worse than I thought. I tried to follow a few posts on the solutions, but one problem came after another one, so I need help from here please. Here is the history and my configuration:

Version: 6.9.2

Current Status of the unraid: Power On, with Array stopped.

Cache drives: 2 of them, one is Intel, one is Samsung (for easier reference).

- The Intel one should die soon or later

The dashboard's log is reporting 100% full
1. I checked /var/log, seems docker.log.1 (I don't remember the exact name, but definitely not the docker.log) has occupied most of the log space with a very large file size (more than 100MB)
2. I cat the log, and used the corresponding container id to find out that due to influxDB (one of the container) not able to write log to /var/log. And I suspected due to the full of cache drive (only around 1-3GB left), influxDB is not able to write to cache drive.
I restarted the unRaid, the log are all gone and the /var/log becomes normal at that time.
1. But the Intel one is reported missing, even not in unassigned devices
2. The Samung one as I recall, has an error of "unraid cache Unmountable: No file system "Not installed""
I restarted again, now the Intel one is shown as unassigned device
1. the cache pool shown a message some what like: one or more missing drive blabalbla
I followed post instruction to do the following:
1. stop the array
2. unassign all cache drives
3. start the array in maintenance mode (to try to forget the cache config settings)
4. stop the array
5. assign Intel and Samsung in exact order of the cache pool
6. start the array

Both cache drive shows in pool normally, and I thought everything is fine now, but that is not the case

I am not able to access the Dockers tab and it shows something as follow:

Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (No such file or directory) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 682
Couldn't create socket: [2] No such file or directory
Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 866

a random drive shown in unassigned device (which I don't have) as Dev 4
1. SandForce_200026BB__1 (sdm)

I checked the docker log using tail -100 /var/log/docker.log (around 100 lines shown as below)

time="2022-10-08T03:15:16.940514627+08:00" level=error msg="failed to process event" container=25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f error="open /var/lib/docker/containers/25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f/.tmp-config.v2.json332376362: read-only file system" event=exit event-info="{25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 20805 137 2022-10-07 19:15:16.923667307 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:16.944050238+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60: read-only file system" id=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60
time="2022-10-08T03:15:16.953931831+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a: read-only file system" id=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a
time="2022-10-08T03:15:16.953935188+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27: read-only file system" id=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27
time="2022-10-08T03:15:16.965584229+08:00" level=error msg="1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:16.965657691+08:00" level=error msg="failed to process event" container=1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 error="open /var/lib/docker/containers/1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3/.tmp-config.v2.json155562625: read-only file system" event=exit event-info="{1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 31310 137 2022-10-07 19:15:16.912567685 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:16.967509988+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4: read-only file system" id=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4
time="2022-10-08T03:15:17.011648984+08:00" level=error msg="12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:17.011715497+08:00" level=error msg="failed to process event" container=12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 error="open /var/lib/docker/containers/12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54/.tmp-config.v2.json277975020: read-only file system" event=exit event-info="{12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 25318 137 2022-10-07 19:15:16.903805833 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:17.021399629+08:00" level=error msg="551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:17.021464447+08:00" level=error msg="failed to process event" container=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 error="open /var/lib/docker/containers/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60/.tmp-config.v2.json765806171: read-only file system" event=exit event-info="{551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 25119 137 2022-10-07 19:15:16.933217459 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:17.049419838+08:00" level=error msg="a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:17.049488151+08:00" level=error msg="failed to process event" container=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 error="open /var/lib/docker/containers/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27/.tmp-config.v2.json021408766: read-only file system" event=exit event-info="{a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 30098 137 2022-10-07 19:15:16.940755467 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:17.073814246+08:00" level=error msg="a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:17.073891074+08:00" level=error msg="failed to process event" container=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a error="open /var/lib/docker/containers/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a/.tmp-config.v2.json305009989: read-only file system" event=exit event-info="{a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a 27381 137 2022-10-07 19:15:16.9424702 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:17.155270727+08:00" level=error msg="2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
time="2022-10-08T03:15:17.155823584+08:00" level=error msg="failed to process event" container=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 error="open /var/lib/docker/containers/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4/.tmp-config.v2.json940720096: read-only file system" event=exit event-info="{2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 29046 137 2022-10-07 19:15:16.95590133 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
time="2022-10-08T03:15:17.198219596+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0 returned error: error while removing network: error marking network br0 (dd15b5594b2eea1ab0adc8ca4bc416bbb40f2d574e51a41b016d3686d3d0d120) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system"
time="2022-10-08T03:15:17.226223431+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0.30 returned error: error while removing network: error marking network br0.30 (2cebd4600022db1d6224d4e1ba5a4a0830a3bb5162bb0d6509e3a41f691576cb) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system"

I am now not able to use any Dockers, and I at the moment am afraid of doing any more configuration until further advices are given. Please help thanks a lot!

P.S. I am attaching the diagnostics.

uncrate-diagnostics-20221010-1819.zip

Edited October 14, 2022 by PzrrL
add more info

JorgeB · October 10, 2022

Pool was working correctly but one of the devices dropped offline:

Oct  8 02:41:06 UnCRATE kernel: ata1.00: revalidation failed (errno=-19)
Oct  8 02:41:06 UnCRATE kernel: ata1.00: disabled
Oct  8 02:41:06 UnCRATE kernel: sd 1:0:0:0: rejecting I/O to offline device
Oct  8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914232 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Oct  8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Oct  8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914248 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Oct  8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Oct  8 02:41:12 UnCRATE kernel: ata1: hard resetting link

Check replace cables for sdi then post new diags.

PzrrL · October 10, 2022

hi @JorgeB, cable replaced. Please take a look thanks!

uncrate-diagnostics-20221010-2113.zip

JorgeB · October 10, 2022

Everything looks good so far, but the pool is not redundant, likely due to a bug in older Unraid releases when the pool was created, type in the console:

btrfs balance start -mconvert=raid1 /mnt/cache

Then run a scrub.

PzrrL · October 10, 2022

@JorgeB thanks.

This is what I got for output:

Segmentation fault

May I know what is a scrub?

JorgeB · October 10, 2022

Please post new diags.

PzrrL · October 11, 2022

@JorgeB sorry forgot to post it. Apart from the above, I am having full log disk again.

ls -al in /var/log shows that docker.log.1 is having 128MB. All with lines like:

time="2022-10-11T09:28:54.292768994+08:00" level=error msg="Failed to log msg \"\" for logger json-file: error writing log entry: write /var/lib/docker/containers/56144929

I checked and that is influxDB again. Possibly due to Telegraf writing lots of things to it while cache pool is almost full. But actually I still got 31GB left on cache pool. Would you possibly know why telegraft is not able to write to influxDB? Thanks

uncrate-diagnostics-20221011-2137.zip

Edited October 11, 2022 by PzrrL
more info on influxDB

JorgeB · October 11, 2022

Try updating to v6.11.1 and run the balance again, if it still crashes you'll need to backup and re-format the pool.

PzrrL · October 11, 2022

@JorgeB since the var log is full again, I had to restart. After restarting without auto start array, the Intel Cache drive is missing again. I think it is not due to the SATA cable. The current status is same as step 2 in my original post. I am not sure what is the next step now.

Do you think I am having a faulty SSD? If yes, I might want to switch it then. Please advice on the steps thanks!

uncrate-diagnostics-20221011-2358.zip

Edited October 11, 2022 by PzrrL
Add question

JorgeB · October 11, 2022

33 minutes ago, PzrrL said:

Do you think I am having a faulty SSD?

It's possible, but try replacing/swapping cables first, since the pool it's not redundant you need both devices.

PzrrL · October 11, 2022

@JorgeB I already did the cable swapping, but still having the same situation. I am currently stuck at step 2 again. Do I repeat step 3 and 4 now? Thanks

JorgeB · October 11, 2022

Without the missing device it's not possible to mount pool normally.

PzrrL · October 12, 2022

@JorgeB I am back to the step where finally 2 cache drives are mounted as a pool. However, the docker service is not available when I click on the Docker tab, here is the error:

I tried to run the command you provided before:

btrfs balance start -mconvert=raid1 /mnt/cache

Output:
ERROR: error during balancing '/mnt/cache': Read-only file system
There may be more info in syslog - try dmesg | tail

What should be my next step? Million thanks!

uncrate-diagnostics-20221012-2328.zip

Edited October 12, 2022 by PzrrL
add details

JorgeB · October 12, 2022

Balance is failing and the pool going read-only because there's no unallocated space in one of the devices, easiest way out of this would be to backup and re-format the pool, if for some reason that's not practical I can give step by step instructions to try and fix the current pool, but it's an involved process and it won't work if a device keeps dropping, so you decide.

PzrrL · October 12, 2022

@JorgeB I already bought a new drive, ready to swap out the faulty Intel one. Just I am not sure how to do the backup/restore of the pool right now. I will need to retain all my data on the cache drive and I don't mind processes. Please advice thanks!

PzrrL · October 12, 2022

3 minutes ago, JorgeB said:

no unallocated space

do you mean that my Intel drive is possible not having any more writable sector because it has been reallocating lots of bad sector already?

JorgeB · October 12, 2022

6 minutes ago, PzrrL said:

do you mean that my Intel drive is possible not having any more writable sector because it has been reallocating lots of bad sector already?

No it's a btrfs allocation issue, try this:

-if array autostart is enabled disable it

-reboot the server since there was already a btrfs crash

-check that the cache device is still sdi, if it changed adjust the command accordingly, then with the array stopped type:

mkdir /temp
mount -t btrfs -o skip_balance /dev/sdi1 /temp
btrfs balance start -dusage=50 /temp

If the balance completes successfully, post the output of:

btrfs fi usage -T /temp

PzrrL · October 12, 2022

@JorgeB Just to confirm, do I run all these commands while the array is stopped after a restart/shutdown (Somehow I need to do a shutdown instead of restart in order to have my faulty SSD show up)?

PzrrL · October 12, 2022

When I run:

btrfs balance start -dusage=50 /temp

Output:
ERROR: unable to start balance, another exclusive operation 'balance' in progress

I am sure I am not running anything else. After shutdown and power up, I run the first 2 commands directly. But the 3rd one fails.

JorgeB · October 12, 2022

3 minutes ago, PzrrL said:

ERROR: unable to start balance, another exclusive operation 'balance' in progress

Yes, my bad, first type

btrfs balance cancel /temp

then

btrfs balance start -dusage=50 /temp

PzrrL · October 12, 2022

@JorgeB there you go thanks!

root@unRAID:/# mkdir /temp
root@unRAID:/# mount -t btrfs -o skip_balance /dev/sdi1 /temp
root@unRAID:/# btrfs balance cancel /temp
root@unRAID:/# btrfs balance start -dusage=50 /temp
Done, had to relocate 23 out of 237 chunks
root@unRAID:/# btrfs fi usage -T /temp
Overall:
    Device size:                 680.02GiB
    Device allocated:            424.06GiB
    Device unallocated:          255.95GiB
    Device missing:                  0.00B
    Used:                        408.39GiB
    Free (estimated):            134.99GiB      (min: 134.99GiB)
    Free (statfs, df):            28.90GiB
    Data ratio:                       2.00
    Metadata ratio:                   1.00
    Global reserve:              109.30MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data      Metadata  System              
Id Path      RAID1     single    single   Unallocated
-- --------- --------- --------- -------- -----------
 1 /dev/sdi1 211.00GiB   2.00GiB 64.00MiB   234.07GiB
 2 /dev/sdj1 211.00GiB         -        -    21.89GiB
-- --------- --------- --------- -------- -----------
   Total     211.00GiB   2.00GiB 64.00MiB   255.95GiB
   Used      203.98GiB 433.94MiB 48.00KiB            
root@unRAID:/#

JorgeB · October 12, 2022

Good, now type:

btrfs balance start -mconvert=raid1 /temp

then again

btrfs fi usage -T /temp

PzrrL · October 12, 2022

@JorgeB output as follow thanks:

root@unRAID:/# btrfs balance start -mconvert=raid1 /temp
Done, had to relocate 4 out of 215 chunks
root@unRAID:/# btrfs fi usage -T /temp
Overall:
    Device size:                 680.02GiB
    Device allocated:            426.19GiB
    Device unallocated:          253.83GiB
    Device missing:                  0.00B
    Used:                        408.82GiB
    Free (estimated):            133.93GiB      (min: 133.93GiB)
    Free (statfs, df):            26.81GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              108.70MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data      Metadata  System              
Id Path      RAID1     RAID1     RAID1    Unallocated
-- --------- --------- --------- -------- -----------
 1 /dev/sdi1 211.00GiB   2.00GiB 96.00MiB   234.04GiB
 2 /dev/sdj1 211.00GiB   2.00GiB 96.00MiB    19.79GiB
-- --------- --------- --------- -------- -----------
   Total     211.00GiB   2.00GiB 96.00MiB   253.83GiB
   Used      203.98GiB 433.06MiB 48.00KiB            
root@unRAID:/#

JorgeB · October 12, 2022

Great, pool is redundant now, type

umount /temp

and you can start the array, then run a correcting scrub, to do that click on cache and scroll down to the scrub section, when done post the result.

PzrrL · October 12, 2022

15 minutes ago, JorgeB said:

run a correcting scrub

@JorgeB do I need to check the "Repair corrupted blocks" box?

Edited October 12, 2022 by PzrrL

[SOLVED] "read-only file system" after crash of cache pool

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation