daan_SVK

Members

Joined
June 1, 201214 yr
Last visited
June 24, 20251 yr

View Profile Find content

Apprentice

Current rank (3/14)

Gender
Undisclosed

The recent visitors block is disabled and is not being shown to other users.

BTRFS errors and corruption
BTRFS errors and corruption

daan_SVK replied to daan_SVK's topic in General Support

ok, so I replaced the drive that was reported as MISSING in the log and the Cache is rebuilding now. The Array started normally. it's been a bit of a monologue here but all in all, I guess I'm good.
- April 5, 20251 yr
- 3 replies
BTRFS errors and corruption
BTRFS errors and corruption

daan_SVK replied to daan_SVK's topic in General Support

so just looking at the BRFS config log, the drive actually shows as missing: Total devices 2 FS bytes used 200.12GiB devid 1 size 953.87GiB used 229.06GiB path /dev/nvme0n1p1 devid 2 size 0 used 0 path /dev/nvme1n1p1 MISSING I can however see the drive as healthy in the GUI so I wonder if I should just reboot it and see if it comes back. Again, not sure what my next step should be.
- April 5, 20251 yr
- 3 replies
daan_SVK started following new install - no webgui and BTRFS errors and corruption
- April 5, 20251 yr
BTRFS errors and corruption
BTRFS errors and corruption

daan_SVK posted a topic in General Support

hello, just looking for some guidance on a course of action with my corrupted BTRFS cache pool since I woke up to a wall of errors in my log. This is a relatively fresh build but has been running for 17 days flawlessly so this came out of the blue. The errors are listed below and full diagnostics is attached. It should captured the first instance of the error as the server wasn't rebooted since. I ran a full scrub but all the errors came back as unrecoverable. I did not check the FS integrity. should I just replace the drive? Apr 5 06:45:59 Tower kernel: btrfs_end_super_write: 11 callbacks suppressed Apr 5 06:45:59 Tower kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme1n1p1 (-5) Apr 5 06:45:59 Tower kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 2 Apr 5 06:46:00 Tower kernel: btrfs_dev_stat_inc_and_print: 3 callbacks suppressed Apr 5 06:46:00 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 127968576, rd 52831663, flush 907641, corrupt 3, gen 0 tower-diagnostics-20250405-0644.zip
- April 5, 20251 yr
- 3 replies
new install - no webgui
new install - no webgui

daan_SVK replied to daan_SVK's topic in General Support

I appreciate the time you took to respond but it was indeed incomplete image on the USB stick. this was a new install in new environment so I was suspecting something wrong with the network, I never had issues with the USB creator before. I will stick with the manual USB creation, is like we used to do it in the 90's anyway.
- February 14, 20242 yr
- 8 replies
new install - no webgui
new install - no webgui

daan_SVK replied to daan_SVK's topic in General Support

manual install indeed resolved. I wish there was a way to tell the USB creator failed to write the full image to the USB stick. All it says is "Writing done!" which really indicates it was done successfully.
- February 14, 20242 yr
- 8 replies
new install - no webgui
new install - no webgui

daan_SVK replied to daan_SVK's topic in General Support

I used the USB creator from the website, multiple USB keys as well. Should I just extract it manually? Doesn't the creator check the USB integrity once it writes the image?
- February 14, 20242 yr
- 8 replies
new install - no webgui
new install - no webgui

daan_SVK replied to daan_SVK's topic in General Support

tower-diagnostics-20190101-1426.zip here is the zip, thank for looking at it.
- February 14, 20242 yr
- 8 replies
new install - no webgui
new install - no webgui

daan_SVK posted a topic in General Support

hi there, I'm just looking for some ideas on how to troubleshoot this further. I am trying to test a new build, it's a Lenovo P620 workstation. I imaged a new USB key and booted it up, I get no webgui, connection refused. - I can ping the server by hostname and IP - router shows the server by hostname as a connected device - I get no local gui because the server has a P2000 and it needs a driver first - I can not ssh into the server as the root password hasn't been set in the webgui first - I tried different USB sticks and different USB ports, it's always the same the server runs headless, is there anything else I can try?
- February 14, 20242 yr
- 8 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

I will replace the cable, it's just weird the server started having all these odd issues all of a sudden.
- February 8, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

Sure, please see attached tower-diagnostics-20230207-1932.zip
- February 8, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

so the disk rebuild failed with read errors again on the same drive so I replaced it and the replacement drive is rebuilding now, however I now see this in the log: Feb 7 17:51:28 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Feb 7 17:51:28 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error Feb 7 17:51:28 Tower kernel: ata9: SError: { UnrecovData 10B8B BadCRC } Feb 7 17:51:28 Tower kernel: ata9.00: failed command: READ DMA EXT Feb 7 17:51:28 Tower kernel: ata9.00: cmd 25/00:40:68:1e:da/00:05:4b:00:00/e0 tag 4 dma 688128 in Feb 7 17:51:28 Tower kernel: res 50/00:00:67:1e:da/00:00:4b:00:00/e0 Emask 0x10 (ATA bus error) Feb 7 17:51:28 Tower kernel: ata9.00: status: { DRDY } Feb 7 17:51:28 Tower kernel: ata9: hard resetting link Feb 7 17:51:28 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 7 17:51:28 Tower kernel: ata9.00: configured for UDMA/133 Feb 7 17:51:28 Tower kernel: ata9: EH complete and also my parity drive UDMA CRC error count just went from 0 to 1. I was originally thinking to replace the Sata cable to the disabled drive but now with the CRC error on the parity drive I'm wondering if I should just abandon the motherboard controller and move all the drives onto an LSI card.
- February 8, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

I replaced the RAM and disabled C-states, I can't believe they were enabled. The drive is rebuilding now onto itself, I will report back. thanks again.
- February 7, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

I did as you suggested, rebooted and the btrfs errors cleared. I was hoping that was the end of it but the server locked up with a Kernel Panic two days later. Rebooted with a successful parity check, the server ran OK for another two or three days. Last night it locked up again with Kernel Panic. After it was rebooted, a disk was disabled during the parity check which never happened before. I have a spare disk that I can replace the faulty one, if it is indeed faulty. However, I can not stop the array as the server is reporting that the parity check is running. It does not appear so as all the disks are spun down. Pressing the Cancel or Resume Parity check button does not re-enable the Stop array button so I'm not sure how to proceed. the latest diagnostics is below, what's my best course of action here? thanks in advance! tower-diagnostics-20230205-1050.zip
- February 5, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

sure, please see attached. the pool was rebalanced and scrubbed after the docker image was recreated. tower-diagnostics-20230127-1717.zip
- January 28, 20233 yr
- 18 replies
suspecting corrupted cache/docker.img
suspecting corrupted cache/docker.img

daan_SVK replied to daan_SVK's topic in General Support

I deleted and re-created the docker container, reinstalled all my dockers, but immediately saw more btrfs errors in the log: Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 Jan 27 15:22:40 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on 335167488 wanted 2298130 found 2298088 ran scrub on the cache pool, no errors reported. Are the errors from within the docker container img? How do I resolve this for good? I'm rebalancing the pool now as I saw a thread where the full FS allocation caused the same error on cache.
- January 27, 20233 yr
- 18 replies

daan_SVK

Joined

Last visited

Apprentice

Posts

Solutions

Reputation

BTRFS errors and corruption

BTRFS errors and corruption

BTRFS errors and corruption

new install - no webgui

new install - no webgui

new install - no webgui

new install - no webgui

new install - no webgui

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

suspecting corrupted cache/docker.img

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)