spalmisano Posted July 11, 2022 Share Posted July 11, 2022 This morning the cache drive started throwing 'read-only file system' errors. It wasn't full, and research showed a possible corrupted docker.img as the culprit. I backed up that file and stopped the docker service in hopes of recreating it. Since the cache drive was read only it wouldn't delete, and although the docker service did show as re-starting in the UI, clicking the docker tab gives the 'Docker service failed to re-start.' error. Ive attached the latest diagnostics, can still SSH in and read/copy the cache drive, and I do have a CA backup of appdata. Is this just a matter of reformatting (switching to xfs?), restoring the appdata backup, and starting again? Anything else? Is it better to replace the drive instead? Thanks for the help. media-diagnostics-20220711-1303.zip Quote Link to comment
JorgeB Posted July 11, 2022 Share Posted July 11, 2022 15 minutes ago, spalmisano said: Is this just a matter of reformatting (switching to xfs?), restoring the appdata backup, and starting again? Yes, doesn't look like a device problem, just filesystem corruption. Quote Link to comment
spalmisano Posted July 13, 2022 Author Share Posted July 13, 2022 I was able to reformat the cache drive and, while the tar from the appdata backup was also corrupted (it was on the array and Im guessing just bad luck that it too wasn’t viable), I re-added the containers and just reconfigured everything on the cache drive. The system had halted sometime overnight and the part of the trace I saw read ‘sdb1 corruption’ (the cache is sdb) and to run an xfs repair on it. While typing this Im also seeing the ‘your flash drive is corrupted or offline’ and the contents of /boot is null. So now I need to replace the USB drive, and also the cache SSD? Is it worth doing a cache pool if it’ll help make SSD failure/corruption more easily weatherable? Diagnostics attached. Anything in here to suggest corruption on both devices so close together is more than coincidence? media-diagnostics-20220713-0936.zip Quote Link to comment
JorgeB Posted July 13, 2022 Share Posted July 13, 2022 12 minutes ago, spalmisano said: The system had halted sometime overnight and the part of the trace I saw read ‘sdb1 corruption’ (the cache is sdb) and to run an xfs repair on it. Filesystem corruption again and so soon after a format might indicate a hardware issue. Try with another device if you have one, pools won't help with filesystem corruption. Quote Link to comment
spalmisano Posted July 13, 2022 Author Share Posted July 13, 2022 17 minutes ago, JorgeB said: Filesystem corruption again and so soon after a format might indicate a hardware issue. Try with another device if you have one, pools won't help with filesystem corruption. Does a cache pool provide any redundancy? If one in the pool fails will it still operate as a cache of one, or is it simply additional storage? Quote Link to comment
JorgeB Posted July 13, 2022 Share Posted July 13, 2022 A redundant pool, similar to parity in this case, provides redundancy for a failed device, not for filesystem corruption. Quote Link to comment
spalmisano Posted July 13, 2022 Author Share Posted July 13, 2022 I should have been more clear; that’s what I was referring to. Thanks for the clarification. I’ll replace the drive with two additional, and it looks like I need to replace the USB as well. Appreciate the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.