Jump to content

[SOLVED] "read-only file system" after crash of cache pool


PzrrL
Go to solution Solved by JorgeB,

Recommended Posts

This seems to be worse than I thought. I tried to follow a few posts on the solutions, but one problem came after another one, so I need help from here please. Here is the history and my configuration:

 

Version: 6.9.2

Current Status of the unraid: Power On, with Array stopped.

Cache drives: 2 of them, one is Intel, one is Samsung (for easier reference).

- The Intel one should die soon or later

 

  1. The dashboard's log is reporting 100% full
    1. I checked /var/log, seems docker.log.1 (I don't remember the exact name, but definitely not the docker.log) has occupied most of the log space with a very large file size (more than 100MB)
    2. I cat the log, and used the corresponding container id to find out that due to influxDB (one of the container) not able to write log to /var/log. And I suspected due to the full of cache drive (only around 1-3GB left), influxDB is not able to write to cache drive.
  2. I restarted the unRaid, the log are all gone and the /var/log becomes normal at that time.
    1.  But the Intel one is reported missing, even not in unassigned devices
    2. The Samung one as I recall, has an error of "unraid cache Unmountable: No file system "Not installed""
  3. I restarted again, now the Intel one is shown as unassigned device
    1. the cache pool shown a message some what like: one or more missing drive blabalbla
  4. I followed post instruction to do the following:
    1. stop the array
    2. unassign all cache drives
    3. start the array in maintenance mode (to try to forget the cache config settings)
    4. stop the array
    5. assign Intel and Samsung in exact order of the cache pool
    6. start the array
  5. Both cache drive shows in pool normally, and I thought everything is fine now, but that is not the case
    1. I am not able to access the Dockers tab and it shows something as follow:
    2. Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (No such file or directory) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 682
      Couldn't create socket: [2] No such file or directory
      Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 866
    3. a random drive shown in unassigned device (which I don't have) as Dev 4
      1. SandForce_200026BB__1 (sdm)983479980_2022-10-1018_18_13-Window.thumb.png.d6fa28910af5ff8cc0b2aa37f7e06c06.png
  6. I checked the docker log using tail -100 /var/log/docker.log (around 100 lines shown as below)
    1. time="2022-10-08T03:15:16.940514627+08:00" level=error msg="failed to process event" container=25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f error="open /var/lib/docker/containers/25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f/.tmp-config.v2.json332376362: read-only file system" event=exit event-info="{25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 20805 137 2022-10-07 19:15:16.923667307 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:16.944050238+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60: read-only file system" id=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60
      time="2022-10-08T03:15:16.953931831+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a: read-only file system" id=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a
      time="2022-10-08T03:15:16.953935188+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27: read-only file system" id=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27
      time="2022-10-08T03:15:16.965584229+08:00" level=error msg="1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:16.965657691+08:00" level=error msg="failed to process event" container=1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 error="open /var/lib/docker/containers/1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3/.tmp-config.v2.json155562625: read-only file system" event=exit event-info="{1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 31310 137 2022-10-07 19:15:16.912567685 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:16.967509988+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4: read-only file system" id=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4
      time="2022-10-08T03:15:17.011648984+08:00" level=error msg="12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:17.011715497+08:00" level=error msg="failed to process event" container=12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 error="open /var/lib/docker/containers/12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54/.tmp-config.v2.json277975020: read-only file system" event=exit event-info="{12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 25318 137 2022-10-07 19:15:16.903805833 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:17.021399629+08:00" level=error msg="551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:17.021464447+08:00" level=error msg="failed to process event" container=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 error="open /var/lib/docker/containers/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60/.tmp-config.v2.json765806171: read-only file system" event=exit event-info="{551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 25119 137 2022-10-07 19:15:16.933217459 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:17.049419838+08:00" level=error msg="a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:17.049488151+08:00" level=error msg="failed to process event" container=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 error="open /var/lib/docker/containers/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27/.tmp-config.v2.json021408766: read-only file system" event=exit event-info="{a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 30098 137 2022-10-07 19:15:16.940755467 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:17.073814246+08:00" level=error msg="a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:17.073891074+08:00" level=error msg="failed to process event" container=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a error="open /var/lib/docker/containers/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a/.tmp-config.v2.json305009989: read-only file system" event=exit event-info="{a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a 27381 137 2022-10-07 19:15:16.9424702 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:17.155270727+08:00" level=error msg="2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown"
      time="2022-10-08T03:15:17.155823584+08:00" level=error msg="failed to process event" container=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 error="open /var/lib/docker/containers/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4/.tmp-config.v2.json940720096: read-only file system" event=exit event-info="{2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 29046 137 2022-10-07 19:15:16.95590133 +0000 UTC false <nil>}" module=libcontainerd namespace=moby
      time="2022-10-08T03:15:17.198219596+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0 returned error: error while removing network: error marking network br0 (dd15b5594b2eea1ab0adc8ca4bc416bbb40f2d574e51a41b016d3686d3d0d120) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system"
      time="2022-10-08T03:15:17.226223431+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0.30 returned error: error while removing network: error marking network br0.30 (2cebd4600022db1d6224d4e1ba5a4a0830a3bb5162bb0d6509e3a41f691576cb) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system"
  7. I am now not able to use any Dockers, and I at the moment am afraid of doing any more configuration until further advices are given. Please help thanks a lot!

 

P.S. I am attaching the diagnostics.

uncrate-diagnostics-20221010-1819.zip

Edited by PzrrL
add more info
Link to comment

Pool was working correctly but one of the devices dropped offline:

 

Oct  8 02:41:06 UnCRATE kernel: ata1.00: revalidation failed (errno=-19)
Oct  8 02:41:06 UnCRATE kernel: ata1.00: disabled
Oct  8 02:41:06 UnCRATE kernel: sd 1:0:0:0: rejecting I/O to offline device
Oct  8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914232 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Oct  8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Oct  8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914248 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Oct  8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Oct  8 02:41:12 UnCRATE kernel: ata1: hard resetting link

 

Check replace cables for sdi then post new diags.

Link to comment

@JorgeB sorry forgot to post it. Apart from the above, I am having full log disk again.

 

ls -al in /var/log shows that docker.log.1 is having 128MB. All with lines like:

time="2022-10-11T09:28:54.292768994+08:00" level=error msg="Failed to log msg \"\" for logger json-file: error writing log entry: write /var/lib/docker/containers/56144929

 

I checked and that is influxDB again. Possibly due to Telegraf writing lots of things to it while cache pool is almost full. But actually I still got 31GB left on cache pool. Would you possibly know why telegraft is not able to write to influxDB? Thanks

uncrate-diagnostics-20221011-2137.zip

Edited by PzrrL
more info on influxDB
Link to comment

@JorgeB since the var log is full again, I had to restart. After restarting without auto start array, the Intel Cache drive is missing again. I think it is not due to the SATA cable. The current status is same as step 2 in my original post. I am not sure what is the next step now.

 

Do you think I am having a faulty SSD? If yes, I might want to switch it then. Please advice on the steps thanks!

 

uncrate-diagnostics-20221011-2358.zip

Edited by PzrrL
Add question
Link to comment

@JorgeB I am back to the step where finally 2 cache drives are mounted as a pool. However, the docker service is not available when I click on the Docker tab, here is the error:

image.thumb.png.e6dc12b6952a4e371eab6832b67ba1b0.png

 

I tried to run the command you provided before:

btrfs balance start -mconvert=raid1 /mnt/cache

Output:
ERROR: error during balancing '/mnt/cache': Read-only file system
There may be more info in syslog - try dmesg | tail

 

What should be my next step? Million thanks!

uncrate-diagnostics-20221012-2328.zip

Edited by PzrrL
add details
Link to comment

Balance is failing and the pool going read-only because there's no unallocated space in one of the devices, easiest way out of this would be to backup and re-format the pool, if for some reason that's not practical I can give step by step instructions to try and fix the current pool, but it's an involved process and it won't work if a device keeps dropping, so you decide.

Link to comment
6 minutes ago, PzrrL said:

do you mean that my Intel drive is possible not having any more writable sector because it has been reallocating lots of bad sector already?

No it's a btrfs allocation issue, try this:

 

-if array autostart is enabled disable it

-reboot the server since there was already a btrfs crash

-check that the cache device is still sdi, if it changed adjust the command accordingly, then with the array stopped type:
 

mkdir /temp
mount -t btrfs -o skip_balance /dev/sdi1 /temp
btrfs balance start -dusage=50 /temp

If the balance completes successfully, post the output of:

btrfs fi usage -T /temp

 

Link to comment

@JorgeB there you go thanks!

root@unRAID:/# mkdir /temp
root@unRAID:/# mount -t btrfs -o skip_balance /dev/sdi1 /temp
root@unRAID:/# btrfs balance cancel /temp
root@unRAID:/# btrfs balance start -dusage=50 /temp
Done, had to relocate 23 out of 237 chunks
root@unRAID:/# btrfs fi usage -T /temp
Overall:
    Device size:                 680.02GiB
    Device allocated:            424.06GiB
    Device unallocated:          255.95GiB
    Device missing:                  0.00B
    Used:                        408.39GiB
    Free (estimated):            134.99GiB      (min: 134.99GiB)
    Free (statfs, df):            28.90GiB
    Data ratio:                       2.00
    Metadata ratio:                   1.00
    Global reserve:              109.30MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data      Metadata  System              
Id Path      RAID1     single    single   Unallocated
-- --------- --------- --------- -------- -----------
 1 /dev/sdi1 211.00GiB   2.00GiB 64.00MiB   234.07GiB
 2 /dev/sdj1 211.00GiB         -        -    21.89GiB
-- --------- --------- --------- -------- -----------
   Total     211.00GiB   2.00GiB 64.00MiB   255.95GiB
   Used      203.98GiB 433.94MiB 48.00KiB            
root@unRAID:/# 

 

Link to comment

@JorgeB output as follow thanks:

root@unRAID:/# btrfs balance start -mconvert=raid1 /temp
Done, had to relocate 4 out of 215 chunks
root@unRAID:/# btrfs fi usage -T /temp
Overall:
    Device size:                 680.02GiB
    Device allocated:            426.19GiB
    Device unallocated:          253.83GiB
    Device missing:                  0.00B
    Used:                        408.82GiB
    Free (estimated):            133.93GiB      (min: 133.93GiB)
    Free (statfs, df):            26.81GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              108.70MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data      Metadata  System              
Id Path      RAID1     RAID1     RAID1    Unallocated
-- --------- --------- --------- -------- -----------
 1 /dev/sdi1 211.00GiB   2.00GiB 96.00MiB   234.04GiB
 2 /dev/sdj1 211.00GiB   2.00GiB 96.00MiB    19.79GiB
-- --------- --------- --------- -------- -----------
   Total     211.00GiB   2.00GiB 96.00MiB   253.83GiB
   Used      203.98GiB 433.06MiB 48.00KiB            
root@unRAID:/# 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...