TurkeyPerson

Members

Joined
May 7, 20206 yr
Last visited
July 8, 20242 yr

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

31
Reputation
Neutral

3

The recent visitors block is disabled and is not being shown to other users.

Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson replied to TurkeyPerson's topic in General Support

I thought i fixed it, then fixed the cache issue, and then disk 6 went offline again. I mentioned it as background in my first post. Ran a memtest for 24h and all good on that front. Just turned things online so we'll see in a day or two I guess.
- July 5, 20242 yr
- 10 replies
Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson replied to TurkeyPerson's topic in General Support

Thanks everyone. Quick follow up. If this doesn't work, what's the nuclear option? Transfer files off drive and format it?
- July 4, 20242 yr
- 10 replies
Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson replied to TurkeyPerson's topic in General Support

Alas, I'm back with new diagnostics and corruption-related errors. Full error is longer but this is the main part: XFS (md6p1): Internal error ltbno + ltlen > bno at line 1955 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0xe9/0x6af [xfs] Jul 3 13:19:57 Tower kernel: CPU: 6 PID: 11591 Comm: shfs Tainted: P O 6.1.79-Unraid #1 One part states: Jul 3 13:19:57 Tower kernel: XFS (md6p1): Corruption detected. Unmount and run xfs_repair Jul 3 13:19:57 Tower kernel: XFS (md6p1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x479/0x503 [xfs] (fs/xfs/libxfs/xfs_defer.c:573). Shutting down filesystem. Jul 3 13:19:57 Tower kernel: XFS (md6p1): Please unmount the filesystem and rectify the problem(s) But I'm not even sure what drive it's talking about. Since its XFS, I assume its one of the drives in the array, so I'm checking each. I came across this on Disk 6: Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. tower-diagnostics-20240703-2044.zip And... its unmountable. So I guess I should destroy the log again ... I'm getting the feeling I'm screwed - attaching an updated log and I guess I'll set this to repair. But I did this last time and just ended up back here. Going to run it on each drive (except parity) and then run a scrub on the cache drive for good measure. I can't imagine this will be the solution but I'm all out of ideas. tower-diagnostics-20240703-2117.zip
- July 4, 20242 yr
- 10 replies
Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson replied to TurkeyPerson's topic in General Support

Thanks. I think you must be right. From my understanding each of those is actually connected to a different controller. I will give that a shot once this dockers are done. Really appreciate everything you do around here - have a beer on me. Edit: Done - will report back but I don't see any issues so far.
- July 1, 20242 yr
- 10 replies
- - 1
Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson replied to TurkeyPerson's topic in General Support

Thanks JorgeB - I ended up doing that last night and restarting this morning, and then checking your post. It found a ton of errors and fixed them but I didn't save the results unfortunately. I restarted the server to see what was up, but looks like I'm still getting errors - and at the very least it looks like my docker image will need remaking (but I imagine that's the least of my concerns). tower-diagnostics-20240701-0958.zip EDIT: - Fixed the network issue (was an unrelated DNS issue - pihole backup wasn't configured and since dockers were down, no DNS.) - Deleted docker image & reinstalled dockers - Things appear as if they will be ok but I still have not figured out the root cause on this. I do have a syslog but I imagine I shouldn't post it due to personal information? EDIT2: Still seeing this error - is the root cause? I see that this is a known issue. Is this the replacement I want? https://www.amazon.ca/LSI-9211-8I-RAID-Controller-Card/dp/B0BVVN66XG/ Jul 1 11:00:22 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen Jul 1 11:00:22 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Jul 1 11:00:22 Tower kernel: ata2.00: cmd 60/20:28:88:a1:1c/00:00:5e:01:00/40 tag 5 ncq dma 16384 in Jul 1 11:00:22 Tower kernel: res 40/00:28:88:a1:1c/00:00:5e:01:00/40 Emask 0x4 (timeout) Jul 1 11:00:22 Tower kernel: ata2.00: status: { DRDY } Jul 1 11:00:22 Tower kernel: ata2: hard resetting link Jul 1 11:00:22 Tower kernel: ata8.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen Jul 1 11:00:22 Tower kernel: ata8.00: failed command: READ FPDMA QUEUED Jul 1 11:00:22 Tower kernel: ata8.00: cmd 60/20:48:68:54:51/00:00:5d:01:00/40 tag 9 ncq dma 16384 in Jul 1 11:00:22 Tower kernel: res 40/00:48:68:54:51/00:00:5d:01:00/40 Emask 0x4 (timeout) Jul 1 11:00:22 Tower kernel: ata8.00: status: { DRDY } Jul 1 11:00:22 Tower kernel: ata8: hard resetting link Jul 1 11:00:25 Tower kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 1 11:00:25 Tower kernel: ata8.00: configured for UDMA/133 Jul 1 11:00:25 Tower kernel: ata8: EH complete Jul 1 11:00:28 Tower kernel: ata2: link is slow to respond, please be patient (ready=0) Jul 1 11:00:32 Tower kernel: ata2: COMRESET failed (errno=-16) Jul 1 11:00:32 Tower kernel: ata2: hard resetting link Once the dockers are done reinstalling, I'll post a new diagnostic.
- July 1, 20242 yr
- 10 replies
Cascade of issues and now BTRS errors on cache drive and unable to connect
Cascade of issues and now BTRS errors on cache drive and unable to connect

TurkeyPerson posted a topic in General Support

I was out of town and my system randomly went offline. When I returned, one of my array drives was showing as "unmountable disk present unraid". I reseated some SATA cables and ran a repair on the drive in maintenance mode. It took some back and forth but it eventually started working. I also switched my SATA controller to a different PCI slot. Unfortunately, either I also had cache drives corruption, or I created it, and it seems like I now have a ton of issues that have cascaded from it(e.g., none of my services seem to connect to unraid, for example: NETWORK: getaddrinfo ENOTFOUND mothership.unraid.netCLOUD: Socket closed I even tested downloading an update and that didn't work). I tried fixing the corruption. I Took some diagnostics throughout so I'll share them just in case, but the most recent ends in 0209. Appreciate some help - think I may have made things worse by rushing through this and trying to get back online asap. tower-diagnostics-20240701-0209.zip tower-diagnostics-20240701-0152.zip tower-diagnostics-20240701-0148.zip tower-diagnostics-20240701-0025.zip
- July 1, 20242 yr
- 10 replies
var/log filling up - error
var/log filling up - error

TurkeyPerson replied to TurkeyPerson's topic in General Support

So it's my syslog that's filling up that directory - this is what keeps repeating: Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: shpool alloc failed Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: nchan: Out of shared memory while allocating message of size 14047. Increase nchan_max_reserved_memory. Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: *4418775 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.> Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: MEMSTORE:00: can't create shared message for channel /devices Jan 11 04:40:59 Tower nginx: 2024/01/11 04:40:59 [crit] 7029#7029: ngx_slab_alloc() failed: no memory My usage habits have not changed so I'm not leaving more/less GUI windows than before. In fact, trying to make a greater effort to close them but this has consistently been an issue since it started. Is that really the only possibility? Fresh diagnostics taken without a reset attached. tower-diagnostics-20240112-0849.zip Edit: After doing some brief reading, I think this may be caused by NetData. Going to disable it and see if this comes back unless someone has another idea.
- January 12, 20242 yr
- 4 replies
var/log filling up - error
var/log filling up - error

TurkeyPerson replied to TurkeyPerson's topic in General Support

I already rebooted the server so not sure that output is helpful but good to know if/when it happens again. Thanks. Yeah - I'll do this and see if I run into other issues. Thanks.
- December 22, 20232 yr
- 4 replies
var/log filling up - error
var/log filling up - error

TurkeyPerson posted a topic in General Support

Woke up to this error and common issues suggests posting diagnostic online and temporary fix of restarting: error: Compressing program wrote following message to stderr when compressing log /var/log/nginx/error.log.1: gzip: stdout: No space left on device error: failed to compress log /var/log/nginx/error.log.1 Diagnostic is uploaded. tower-diagnostics-20231221-0742.zip
- December 21, 20232 yr
- 4 replies
Log filling up with BTRFS errors/IO errors
Log filling up with BTRFS errors/IO errors

TurkeyPerson replied to TurkeyPerson's topic in General Support

This solution appears to be working so far. Thank you!
- May 8, 20233 yr
- 3 replies
- - 1
Log filling up with BTRFS errors/IO errors
Log filling up with BTRFS errors/IO errors

TurkeyPerson replied to TurkeyPerson's topic in General Support

New diagnostic after changing cables and reboooting tower-diagnostics-20230506-1519.zip
- May 6, 20233 yr
- 3 replies
Log filling up with BTRFS errors/IO errors
Log filling up with BTRFS errors/IO errors

TurkeyPerson posted a topic in General Support

Hi all, I'm seeing this in my logs until they fill up. It's happened twice now since I got new cache disks. I've tried adjusting the cable and am about to try replacing it. Could it be something other than a hardware issue? I have attached diagnostics below, here's an excerpt: May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077984, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077985, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 CDB: opcode=0x2a 2a 00 00 62 7c 40 00 00 60 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454336 op 0x1:(WRITE) flags 0x1800 phys_seg 12 prio class 2 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 CDB: opcode=0x2a 2a 00 00 62 7c e0 00 00 40 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454496 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077986, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077987, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: btrfs_end_super_write: 17 callbacks suppressed May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077989, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 CDB: opcode=0x2a 2a 00 97 cd a6 a8 00 00 08 00 May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 I think the issues begin with this error which may suggest that my PCI SATA controller is the underlying culprit? May 5 16:18:04 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 5 16:18:04 Tower kernel: ata6.00: failed command: DATA SET MANAGEMENT May 5 16:18:04 Tower kernel: ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 21 dma 512 out May 5 16:18:04 Tower kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) May 5 16:18:04 Tower kernel: ata6.00: status: { DRDY } May 5 16:18:04 Tower kernel: ata6: hard resetting link May 5 16:18:10 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:14 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:14 Tower kernel: ata6: hard resetting link May 5 16:18:20 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:24 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:24 Tower kernel: ata6: hard resetting link May 5 16:18:30 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:59 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:59 Tower kernel: ata6: limiting SATA link speed to 3.0 Gbps May 5 16:18:59 Tower kernel: ata6: hard resetting link May 5 16:19:04 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:19:04 Tower kernel: ata6: reset failed, giving up May 5 16:19:04 Tower kernel: ata6.00: disable device May 5 16:19:04 Tower kernel: ata6: EH complete May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=89s May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 CDB: opcode=0x28 28 00 2d a9 29 40 00 00 08 00 May 5 16:19:04 Tower kernel: I/O error, dev sdg, sector 766060864 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2 May 5 16:19:04 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s Update: I added iommu=pt in case it is my Marvell controller causing issues. Prior to doing that, I switched cables (sata and sata power adaptor) and rebooted. The log for each cache drive are littered in errors so im going to keep the array offline till I have a clue whats happening as im concerned about data corruption/loss. Thanks! tower-diagnostics-20230506-0836.zip
- May 6, 20233 yr
- 3 replies
New server keeps crashing
New server keeps crashing

TurkeyPerson replied to liquidrt's topic in General Support

Did you get anywhere? Getting similar errors but only managed to get a screenshot of the log. Upgrading off stable and will see if it reappears in the coming days.
- April 29, 20233 yr
- 13 replies
VMs fail / corruption on cache & parity?
VMs fail / corruption on cache & parity?

TurkeyPerson replied to TurkeyPerson's topic in General Support

Reseated the cables yesterday and ordered new ones to try replacing (will test this evenimg) Seems odd that a bad cache cable would mess with the array parity no?
- April 26, 20233 yr
- 3 replies
TurkeyPerson started following VMs fail / corruption on cache & parity?
- April 26, 20233 yr
VMs fail / corruption on cache & parity?
VMs fail / corruption on cache & parity?

TurkeyPerson posted a topic in General Support

tower-diagnostics-20230426-0733.zipYesterday my parity drive got corrupt or out of sync (fixed). Now my VMs are dead. I'm seeing errors that appear related to one of my cache drives and intend to try changing the SATA cable (reseated everything yesterday). Anyone got any ideas? Will upload diagnostic
- April 26, 20233 yr
- 3 replies

TurkeyPerson

Joined

Last visited

Noob

Posts

Reputation

Cascade of issues and now BTRS errors on cache drive and unable to connect

Cascade of issues and now BTRS errors on cache drive and unable to connect

Cascade of issues and now BTRS errors on cache drive and unable to connect

Cascade of issues and now BTRS errors on cache drive and unable to connect

Cascade of issues and now BTRS errors on cache drive and unable to connect

Cascade of issues and now BTRS errors on cache drive and unable to connect

var/log filling up - error

var/log filling up - error

var/log filling up - error

Log filling up with BTRFS errors/IO errors

Log filling up with BTRFS errors/IO errors

Log filling up with BTRFS errors/IO errors

New server keeps crashing

VMs fail / corruption on cache & parity?

VMs fail / corruption on cache & parity?

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)