PzrrL Posted October 10, 2022 Share Posted October 10, 2022 (edited) This seems to be worse than I thought. I tried to follow a few posts on the solutions, but one problem came after another one, so I need help from here please. Here is the history and my configuration: Version: 6.9.2 Current Status of the unraid: Power On, with Array stopped. Cache drives: 2 of them, one is Intel, one is Samsung (for easier reference). - The Intel one should die soon or later The dashboard's log is reporting 100% full I checked /var/log, seems docker.log.1 (I don't remember the exact name, but definitely not the docker.log) has occupied most of the log space with a very large file size (more than 100MB) I cat the log, and used the corresponding container id to find out that due to influxDB (one of the container) not able to write log to /var/log. And I suspected due to the full of cache drive (only around 1-3GB left), influxDB is not able to write to cache drive. I restarted the unRaid, the log are all gone and the /var/log becomes normal at that time. But the Intel one is reported missing, even not in unassigned devices The Samung one as I recall, has an error of "unraid cache Unmountable: No file system "Not installed"" I restarted again, now the Intel one is shown as unassigned device the cache pool shown a message some what like: one or more missing drive blabalbla I followed post instruction to do the following: stop the array unassign all cache drives start the array in maintenance mode (to try to forget the cache config settings) stop the array assign Intel and Samsung in exact order of the cache pool start the array Both cache drive shows in pool normally, and I thought everything is fine now, but that is not the case I am not able to access the Dockers tab and it shows something as follow: Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (No such file or directory) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 682 Couldn't create socket: [2] No such file or directory Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 866 a random drive shown in unassigned device (which I don't have) as Dev 4 SandForce_200026BB__1 (sdm) I checked the docker log using tail -100 /var/log/docker.log (around 100 lines shown as below) time="2022-10-08T03:15:16.940514627+08:00" level=error msg="failed to process event" container=25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f error="open /var/lib/docker/containers/25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f/.tmp-config.v2.json332376362: read-only file system" event=exit event-info="{25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 25a67c14c4c2caf4740ec4cb6370e1cdee2d35802288965347be260c1895083f 20805 137 2022-10-07 19:15:16.923667307 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:16.944050238+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60: read-only file system" id=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 time="2022-10-08T03:15:16.953931831+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a: read-only file system" id=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a time="2022-10-08T03:15:16.953935188+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27: read-only file system" id=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 time="2022-10-08T03:15:16.965584229+08:00" level=error msg="1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:16.965657691+08:00" level=error msg="failed to process event" container=1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 error="open /var/lib/docker/containers/1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3/.tmp-config.v2.json155562625: read-only file system" event=exit event-info="{1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 1b7506adbd32fed6be08d8c9f3516d551d54c57d58bef311a8d73fe561b38cf3 31310 137 2022-10-07 19:15:16.912567685 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:16.967509988+08:00" level=error msg="failed to delete bundle" error="rename /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 /var/lib/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/.2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4: read-only file system" id=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 time="2022-10-08T03:15:17.011648984+08:00" level=error msg="12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:17.011715497+08:00" level=error msg="failed to process event" container=12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 error="open /var/lib/docker/containers/12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54/.tmp-config.v2.json277975020: read-only file system" event=exit event-info="{12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 12d44ece00eae334580d12ee84b44a94ac64b54b24320a9d5c05fafee84e5b54 25318 137 2022-10-07 19:15:16.903805833 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:17.021399629+08:00" level=error msg="551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:17.021464447+08:00" level=error msg="failed to process event" container=551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 error="open /var/lib/docker/containers/551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60/.tmp-config.v2.json765806171: read-only file system" event=exit event-info="{551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 551333d7f5f2fdf9033aeeb6eb7975d4366378105f4d4e209ca4d59057fc7b60 25119 137 2022-10-07 19:15:16.933217459 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:17.049419838+08:00" level=error msg="a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:17.049488151+08:00" level=error msg="failed to process event" container=a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 error="open /var/lib/docker/containers/a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27/.tmp-config.v2.json021408766: read-only file system" event=exit event-info="{a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 a1b55414c6cc17ed410e799edfb7295d29560f37ca1529e51252bd5a62fcde27 30098 137 2022-10-07 19:15:16.940755467 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:17.073814246+08:00" level=error msg="a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:17.073891074+08:00" level=error msg="failed to process event" container=a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a error="open /var/lib/docker/containers/a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a/.tmp-config.v2.json305009989: read-only file system" event=exit event-info="{a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a a5b46d07d977ed6c6091ddd79f94d5f3d109cb0294285fb14d10ee786521c83a 27381 137 2022-10-07 19:15:16.9424702 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:17.155270727+08:00" level=error msg="2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 cleanup: failed to delete container from containerd: write /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db: read-only file system: unknown" time="2022-10-08T03:15:17.155823584+08:00" level=error msg="failed to process event" container=2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 error="open /var/lib/docker/containers/2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4/.tmp-config.v2.json940720096: read-only file system" event=exit event-info="{2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 2cbfe90016db41cec98f5b3f21f529b25373926bffa71c4557027d3cef6db1c4 29046 137 2022-10-07 19:15:16.95590133 +0000 UTC false <nil>}" module=libcontainerd namespace=moby time="2022-10-08T03:15:17.198219596+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0 returned error: error while removing network: error marking network br0 (dd15b5594b2eea1ab0adc8ca4bc416bbb40f2d574e51a41b016d3686d3d0d120) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system" time="2022-10-08T03:15:17.226223431+08:00" level=error msg="Handler for DELETE /v1.41/networks/br0.30 returned error: error while removing network: error marking network br0.30 (2cebd4600022db1d6224d4e1ba5a4a0830a3bb5162bb0d6509e3a41f691576cb) for deletion: failed to update store for object type *libnetwork.network: open /var/lib/docker/network/files/local-kv.db: read-only file system" I am now not able to use any Dockers, and I at the moment am afraid of doing any more configuration until further advices are given. Please help thanks a lot! P.S. I am attaching the diagnostics. uncrate-diagnostics-20221010-1819.zip Edited October 14, 2022 by PzrrL add more info Quote Link to comment
JorgeB Posted October 10, 2022 Share Posted October 10, 2022 Pool was working correctly but one of the devices dropped offline: Oct 8 02:41:06 UnCRATE kernel: ata1.00: revalidation failed (errno=-19) Oct 8 02:41:06 UnCRATE kernel: ata1.00: disabled Oct 8 02:41:06 UnCRATE kernel: sd 1:0:0:0: rejecting I/O to offline device Oct 8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914232 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 Oct 8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Oct 8 02:41:06 UnCRATE kernel: blk_update_request: I/O error, dev sdi, sector 71914248 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 Oct 8 02:41:06 UnCRATE kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Oct 8 02:41:12 UnCRATE kernel: ata1: hard resetting link Check replace cables for sdi then post new diags. Quote Link to comment
PzrrL Posted October 10, 2022 Author Share Posted October 10, 2022 hi @JorgeB, cable replaced. Please take a look thanks! uncrate-diagnostics-20221010-2113.zip Quote Link to comment
JorgeB Posted October 10, 2022 Share Posted October 10, 2022 Everything looks good so far, but the pool is not redundant, likely due to a bug in older Unraid releases when the pool was created, type in the console: btrfs balance start -mconvert=raid1 /mnt/cache Then run a scrub. Quote Link to comment
PzrrL Posted October 10, 2022 Author Share Posted October 10, 2022 @JorgeB thanks. This is what I got for output: Segmentation fault May I know what is a scrub? Quote Link to comment
JorgeB Posted October 10, 2022 Share Posted October 10, 2022 Please post new diags. Quote Link to comment
PzrrL Posted October 11, 2022 Author Share Posted October 11, 2022 (edited) @JorgeB sorry forgot to post it. Apart from the above, I am having full log disk again. ls -al in /var/log shows that docker.log.1 is having 128MB. All with lines like: time="2022-10-11T09:28:54.292768994+08:00" level=error msg="Failed to log msg \"\" for logger json-file: error writing log entry: write /var/lib/docker/containers/56144929 I checked and that is influxDB again. Possibly due to Telegraf writing lots of things to it while cache pool is almost full. But actually I still got 31GB left on cache pool. Would you possibly know why telegraft is not able to write to influxDB? Thanks uncrate-diagnostics-20221011-2137.zip Edited October 11, 2022 by PzrrL more info on influxDB Quote Link to comment
JorgeB Posted October 11, 2022 Share Posted October 11, 2022 Try updating to v6.11.1 and run the balance again, if it still crashes you'll need to backup and re-format the pool. 1 Quote Link to comment
PzrrL Posted October 11, 2022 Author Share Posted October 11, 2022 (edited) @JorgeB since the var log is full again, I had to restart. After restarting without auto start array, the Intel Cache drive is missing again. I think it is not due to the SATA cable. The current status is same as step 2 in my original post. I am not sure what is the next step now. Do you think I am having a faulty SSD? If yes, I might want to switch it then. Please advice on the steps thanks! uncrate-diagnostics-20221011-2358.zip Edited October 11, 2022 by PzrrL Add question Quote Link to comment
JorgeB Posted October 11, 2022 Share Posted October 11, 2022 33 minutes ago, PzrrL said: Do you think I am having a faulty SSD? It's possible, but try replacing/swapping cables first, since the pool it's not redundant you need both devices. Quote Link to comment
PzrrL Posted October 11, 2022 Author Share Posted October 11, 2022 @JorgeB I already did the cable swapping, but still having the same situation. I am currently stuck at step 2 again. Do I repeat step 3 and 4 now? Thanks Quote Link to comment
JorgeB Posted October 11, 2022 Share Posted October 11, 2022 Without the missing device it's not possible to mount pool normally. Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 (edited) @JorgeB I am back to the step where finally 2 cache drives are mounted as a pool. However, the docker service is not available when I click on the Docker tab, here is the error: I tried to run the command you provided before: btrfs balance start -mconvert=raid1 /mnt/cache Output: ERROR: error during balancing '/mnt/cache': Read-only file system There may be more info in syslog - try dmesg | tail What should be my next step? Million thanks! uncrate-diagnostics-20221012-2328.zip Edited October 12, 2022 by PzrrL add details Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 Balance is failing and the pool going read-only because there's no unallocated space in one of the devices, easiest way out of this would be to backup and re-format the pool, if for some reason that's not practical I can give step by step instructions to try and fix the current pool, but it's an involved process and it won't work if a device keeps dropping, so you decide. Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 @JorgeB I already bought a new drive, ready to swap out the faulty Intel one. Just I am not sure how to do the backup/restore of the pool right now. I will need to retain all my data on the cache drive and I don't mind processes. Please advice thanks! Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 3 minutes ago, JorgeB said: no unallocated space do you mean that my Intel drive is possible not having any more writable sector because it has been reallocating lots of bad sector already? Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 6 minutes ago, PzrrL said: do you mean that my Intel drive is possible not having any more writable sector because it has been reallocating lots of bad sector already? No it's a btrfs allocation issue, try this: -if array autostart is enabled disable it -reboot the server since there was already a btrfs crash -check that the cache device is still sdi, if it changed adjust the command accordingly, then with the array stopped type: mkdir /temp mount -t btrfs -o skip_balance /dev/sdi1 /temp btrfs balance start -dusage=50 /temp If the balance completes successfully, post the output of: btrfs fi usage -T /temp Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 @JorgeB Just to confirm, do I run all these commands while the array is stopped after a restart/shutdown (Somehow I need to do a shutdown instead of restart in order to have my faulty SSD show up)? Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 When I run: btrfs balance start -dusage=50 /temp Output: ERROR: unable to start balance, another exclusive operation 'balance' in progress I am sure I am not running anything else. After shutdown and power up, I run the first 2 commands directly. But the 3rd one fails. Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 3 minutes ago, PzrrL said: ERROR: unable to start balance, another exclusive operation 'balance' in progress Yes, my bad, first type btrfs balance cancel /temp then btrfs balance start -dusage=50 /temp 1 Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 @JorgeB there you go thanks! root@unRAID:/# mkdir /temp root@unRAID:/# mount -t btrfs -o skip_balance /dev/sdi1 /temp root@unRAID:/# btrfs balance cancel /temp root@unRAID:/# btrfs balance start -dusage=50 /temp Done, had to relocate 23 out of 237 chunks root@unRAID:/# btrfs fi usage -T /temp Overall: Device size: 680.02GiB Device allocated: 424.06GiB Device unallocated: 255.95GiB Device missing: 0.00B Used: 408.39GiB Free (estimated): 134.99GiB (min: 134.99GiB) Free (statfs, df): 28.90GiB Data ratio: 2.00 Metadata ratio: 1.00 Global reserve: 109.30MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID1 single single Unallocated -- --------- --------- --------- -------- ----------- 1 /dev/sdi1 211.00GiB 2.00GiB 64.00MiB 234.07GiB 2 /dev/sdj1 211.00GiB - - 21.89GiB -- --------- --------- --------- -------- ----------- Total 211.00GiB 2.00GiB 64.00MiB 255.95GiB Used 203.98GiB 433.94MiB 48.00KiB root@unRAID:/# Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 Good, now type: btrfs balance start -mconvert=raid1 /temp then again btrfs fi usage -T /temp Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 @JorgeB output as follow thanks: root@unRAID:/# btrfs balance start -mconvert=raid1 /temp Done, had to relocate 4 out of 215 chunks root@unRAID:/# btrfs fi usage -T /temp Overall: Device size: 680.02GiB Device allocated: 426.19GiB Device unallocated: 253.83GiB Device missing: 0.00B Used: 408.82GiB Free (estimated): 133.93GiB (min: 133.93GiB) Free (statfs, df): 26.81GiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 108.70MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated -- --------- --------- --------- -------- ----------- 1 /dev/sdi1 211.00GiB 2.00GiB 96.00MiB 234.04GiB 2 /dev/sdj1 211.00GiB 2.00GiB 96.00MiB 19.79GiB -- --------- --------- --------- -------- ----------- Total 211.00GiB 2.00GiB 96.00MiB 253.83GiB Used 203.98GiB 433.06MiB 48.00KiB root@unRAID:/# Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 Great, pool is redundant now, type umount /temp and you can start the array, then run a correcting scrub, to do that click on cache and scroll down to the scrub section, when done post the result. Quote Link to comment
PzrrL Posted October 12, 2022 Author Share Posted October 12, 2022 (edited) 15 minutes ago, JorgeB said: run a correcting scrub @JorgeB do I need to check the "Repair corrupted blocks" box? Edited October 12, 2022 by PzrrL Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.