Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

suspecting corrupted cache/docker.img

Featured Replies

hi guys, 

 

I keep loosing the ability to write to my docker image file. After seeing the below in my logs, I'm suspecting the cache or the image file is corrupted and was hoping to get some guidance on what should I do next.

 

Jan 25 17:29:04 Tower kernel: ---[ end trace 0000000000000000 ]---
Jan 25 17:29:04 Tower kernel: BTRFS: error (device loop2: state A) in __btrfs_free_extent:3079: errno=-2 No such entry
Jan 25 17:29:04 Tower kernel: BTRFS info (device loop2: state EA): forced readonly
Jan 25 17:29:04 Tower kernel: BTRFS: error (device loop2: state EA) in btrfs_run_delayed_refs:2157: errno=-2 No such entry
Jan 25 17:29:14 Tower flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update

 

Recreate the docker.img or nuke the whole cache pool and reformat it? 

 

thanks in advance for reading, 

 

 

 

tower-diagnostics-20230125-1719.zip

  • Community Expert
Jan 21 20:50:26 Tower kernel: BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3359, gen 0
Jan 21 20:50:26 Tower kernel: BTRFS info (device sdb1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 2188, gen 0

 

Btrfs is detecting data corruption on both pool members, you should run memtest first.

  • Author

Thanks for the reply, 

 

Yes, I am aware, but this corruption  has been on the cache pool for some time without a changed count.

 

I will run another memtest but I still need to deal with the read only docker image. 

  • Community Expert
21 minutes ago, daan_SVK said:

but I still need to deal with the read only docker image. 

Delete and recreate.

  • Author
1 hour ago, JorgeB said:

 

just to be clear, are you saying recreating the docker image is a better approach than reformatting the cache pool? 

 

I'd like to address the possible cause of the corrupted image as well. 

  • Community Expert

For now I don't see any issues with the pool filesystem, besides the already mentioned data corruption errors, only the docker image went read-only.

  • Author
On 1/26/2023 at 8:09 AM, JorgeB said:

For now I don't see any issues with the pool filesystem, besides the already mentioned data corruption errors, only the docker image went read-only.

 

I deleted and re-created the docker container, reinstalled all my dockers, but immediately saw more btrfs errors in the log: 

 

Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088
Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088
Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088
Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088

 

ran scrub on the cache pool, no errors reported. Are the errors from within the docker container img? How do I resolve this for good? 

 

I'm rebalancing the pool now as I saw a thread where the full FS allocation caused the same error on cache. 

Edited by daan_SVK

  • Author
54 minutes ago, trurl said:

attach diagnostics to your NEXT post in this thread

sure, please see attached. 

 

the pool was rebalanced and scrubbed after the docker image was recreated. 

tower-diagnostics-20230127-1717.zip

  • Community Expert

Does errors are still about the docker image, reboot and post new diags after array start.

  • 2 weeks later...
  • Author
On 1/28/2023 at 12:56 AM, JorgeB said:

Does errors are still about the docker image, reboot and post new diags after array start.

 

I did as you suggested, rebooted and the btrfs errors cleared. I was hoping that was the end of it but the server locked up with a Kernel Panic two days later. Rebooted with a successful parity check, the server ran OK for another two or three days. Last night it locked up again with Kernel Panic. After it was rebooted, a disk was disabled during the parity check which never happened before. 

 

I have a spare disk that I can replace the faulty one, if it is indeed faulty. However, I can not stop the array as the server is reporting that the parity check is running. It does not appear so as all the disks are spun down. Pressing the Cancel or Resume Parity check button does not re-enable the Stop array button so I'm not sure how to proceed. 

 

the latest diagnostics is below, what's my best course of action here? 

 

thanks in advance!

 

 

tower-diagnostics-20230205-1050.zip

  • Community Expert

Type reboot on the console, if it doesn't work after a few minutes you'll need to force it, ideally you'd fix the crashing problem first, see this if you haven't yet.

  • Author
13 hours ago, JorgeB said:

Type reboot on the console, if it doesn't work after a few minutes you'll need to force it, ideally you'd fix the crashing problem first, see this if you haven't yet.

I replaced the RAM and disabled C-states, I can't believe they were enabled. 

 

The drive is rebuilding now onto itself, I will report back. 

 

thanks again. 

  • Author

so the disk rebuild failed with read errors again on the same drive so I replaced it and the replacement drive is rebuilding now, however I now see this in the log: 

Feb  7 17:51:28 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Feb  7 17:51:28 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Feb  7 17:51:28 Tower kernel: ata9: SError: { UnrecovData 10B8B BadCRC }
Feb  7 17:51:28 Tower kernel: ata9.00: failed command: READ DMA EXT
Feb  7 17:51:28 Tower kernel: ata9.00: cmd 25/00:40:68:1e:da/00:05:4b:00:00/e0 tag 4 dma 688128 in
Feb  7 17:51:28 Tower kernel:         res 50/00:00:67:1e:da/00:00:4b:00:00/e0 Emask 0x10 (ATA bus error)
Feb  7 17:51:28 Tower kernel: ata9.00: status: { DRDY }
Feb  7 17:51:28 Tower kernel: ata9: hard resetting link
Feb  7 17:51:28 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb  7 17:51:28 Tower kernel: ata9.00: configured for UDMA/133
Feb  7 17:51:28 Tower kernel: ata9: EH complete

 

and also my parity drive UDMA CRC error count just went from 0 to 1. 

 

I was originally thinking to replace the Sata cable to the disabled drive but now with the CRC error on the parity drive I'm wondering if I should just abandon the motherboard controller and move all the drives onto an LSI card. 

 

 

  • Community Expert

Bad connection on parity, check connections, both ends, power and SATA, including splitters.

  • Community Expert
6 hours ago, daan_SVK said:
BadCRC }

This is usually a bad SATA cable.

  • Author
6 hours ago, JorgeB said:

This is usually a bad SATA cable.

I will replace the cable, it's just weird the server started having all these odd issues all of a sudden. 

 

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.