Log Filling and access beyond end of device errors

September 11, 2025Sep 11

Hi Team,

Hope someone can help me I noticed some of my dockers were running abit slower than normal and I found my logs were at 100%, strange so I found a thread about clearing that and then when i looked as my syslog files I found alot of 'access beyond end of device' seems to be filling these up,

When reading some of the issues seem to point to a dead drive but it still seems to be showing as normal in the system so not sure if its dead or something else causing issues?

Attached is the diag file if someone can provide some guidance :)

glados-diagnostics-20250911-1039.zip

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#10 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#11 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#12 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#13 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#14 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#15 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#16 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#17 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#18 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#19 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#20 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#21 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#22 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#23 access beyond end of device

Sep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#31 access beyond end of device

Edited September 11, 2025Sep 11 by brent3000

Quote

September 11, 2025Sep 11

Syslog already rotated, but looks like cache2 dropped offline, power down, check/replace its cables, and post new diags after array start.

Quote

September 14, 2025Sep 14

Author

I did do a reboot and it seems the logs arnt filling up as bad as they were, both the cache drives show as online but it seems like one just isnt writing?

Do I need to make them re-sync or something or should it have self corrected by now?

Quote

September 14, 2025Sep 14

On 9/11/2025 at 7:41 AM, JorgeB said:
and post new diags after array start.

Quote

September 15, 2025Sep 15

Author

Sorry missed that portion :)

See attached,glados-diagnostics-20250915-1255.zip

Quote

September 15, 2025Sep 15

Data Metadata System

Id Path RAID1 RAID1 RAID1 Unallocated Total Slack

-- --------- ------- -------- --------- ----------- ------- -----

3 /dev/sdc1 1.48TiB 4.00GiB 64.00MiB 340.95GiB 1.82TiB -

4 missing 1.48TiB 4.00GiB 64.00MiB 340.95GiB 1.82TiB -

-- --------- ------- -------- --------- ----------- ------- -----

Total 1.48TiB 4.00GiB 64.00MiB 681.91GiB 3.64TiB 0.00B

Used 1.04TiB 2.67GiB 256.00KiB

Only cache1 is currently part of the pool; reimport the pool with just that device.

on main click on the first device for that pool and then "remove pool"

back on main, create a new pool with the same name and 1 slot

assign the pool device (sdc), leave the filesystem set to auto

start the array to import the pool and post new diags

Quote

September 18, 2025Sep 18

Author

Followed and got some errors showing like this now glados-diagnostics-20250918-1348.zip

I did select both the drives with sdc in slot 1 and sdd in slot 2 and the it moved it to un-assigned drives

Quote

September 18, 2025Sep 18

3 hours ago, brent3000 said:
I did select both the drives with sdc in slot 1 and sdd in slot 2 and the it moved it to un-assigned drives

That's not what I had asked, in this case it doesn't matter, but please follow the instructiopns:

With the array running type:

btrfs balance start -dconvert=single -mconvert=dup /mnt/cache

When that finishes, type

btrfs device remove missing /mnt/cache

Post new diagnostics after that.

Quote

September 18, 2025Sep 18

Author

Ok I did get upto the last parts and oddly with over 50% free space I did get this error but it did most of it as it was down to the final 1% during the balance but still had plenty of storage left

ERROR: error during balancing '/mnt/cache': No space left on device

There may be more info in syslog - try dmesg | tail

I did run the other command tho and I didnt get any response back after running the command so I assume it completed as intended?

see attachedglados-diagnostics-20250919-0024.zip

Quote

September 18, 2025Sep 18

Yes, pool is now single; you need to reimport it once more:

on main click on the first device for that pool and then "remove pool"

back on main, create a new pool with the same name and 1 slot

assign the pool device (sdc), leave the filesystem set to auto

start the array to import the pool and post new diags

The pool should now import with just one device, if yes, stop the array, change slots to 2, add the other device, start the array, and it should create a mirrored pool.

Quote

September 18, 2025Sep 18

Author

I see the part I missed which was the 1 slot section, my bad,

See attached,glados-diagnostics-20250919-0835.zip

This was then running with 1 drive, I then stopped the array changed to 2 and selected sdd into the pool and the below is that diag.

It did say it would wipe the 2nd drive (which I assume would be normal to setup back into the pool format)

glados-diagnostics-20250919-0839.zip

I also got this error do I need to re-balance? It does say there is a 'BTRFS operation is running' so I should leave it to re-setup itself?

Once its all back online I assume I can remove the Historical unassigned drive which is showing as the drive?

Adding some more info, seems the drive is in standby mode and logs are re-filling up again, is there a possible drive issue?

Logs seem to be getting the same issue again

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#0 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#1 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#2 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#3 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#4 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#5 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#6 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#7 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#8 access beyond end of device

Sep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#9 access beyond end of device

Edited September 19, 2025Sep 19 by brent3000

Quote

September 19, 2025Sep 19

Sep 19 08:37:52 GLaDOS kernel: ata2.00: qc timeout after 30000 msecs (cmd 0xec)

Sep 19 08:37:52 GLaDOS kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Sep 19 08:37:52 GLaDOS kernel: ata2.00: revalidation failed (errno=-5)

Sep 19 08:37:52 GLaDOS kernel: ata2.00: disable device

Cache2 dropped offline again, if you haven't yet, replace both cables, if you did, it may be a bad device, but most likely it's a power/connection issue.

Quote

September 19, 2025Sep 19

Author

It was online during the time I was testing it before, would there be a SMART test I can try and validate it with?

In the interim can I do everything you said above until the part of adding it into the pool again? just to stop the errors?

aka this portion?

18 hours ago, JorgeB said:
on main click on the first device for that pool and then "remove pool"
back on main, create a new pool with the same name and 1 slot
assign the pool device (sdc), leave the filesystem set to auto
start the array to import the pool and post new diags

Edited September 19, 2025Sep 19 by brent3000

Quote

September 19, 2025Sep 19

1 hour ago, brent3000 said:
would there be a SMART test I can try and validate it with?

SMART test won't work once a device drops.

Yes, you shoudl be able to reimport the pool with the remaining device.

Quote

September 19, 2025Sep 19

Author

Thanks for the details, give me some time to do some more testing, I may take the drive out and see how it runs on a bench system as a quick validate if the drive is causing the issues as I have moved it around in the system and the port remains working for other drives so not sure if its the port or something else,

Will report back, for now I have rebooted and the drive is back online

Quote

1

September 21, 2025Sep 21

Author

Seems like both are online does this look right? glados-diagnostics-20250921-1516.zip

Just interesting that there isnt any write actions on the other drive just yet

Quote

September 21, 2025Sep 21

The device is currently online, but it's not part of the pool, and there are still what look like bad SATA cables errors logged:

Sep 21 15:15:33 GLaDOS kernel: ata2: log page 10h reported inactive tag 21

Sep 21 15:15:33 GLaDOS kernel: ata2.00: exception Emask 0x1 SAct 0x1c00000 SErr 0x400001 action 0x6

Sep 21 15:15:33 GLaDOS kernel: ata2.00: irq_stat 0x40000008

Sep 21 15:15:33 GLaDOS kernel: ata2: SError: { RecovData Handshk }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUED

Sep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/30:b0:48:08:00/00:00:00:00:00/40 tag 22 ncq dma 24576 in

Sep 21 15:15:33 GLaDOS kernel: res 41/84:01:06:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)

Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUED

Sep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/78:b8:88:08:00/00:00:00:00:00/40 tag 23 ncq dma 61440 in

Sep 21 15:15:33 GLaDOS kernel: res 41/84:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)

Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUED

Sep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/f8:c0:08:09:00/00:00:00:00:00/40 tag 24 ncq dma 126976 in

Sep 21 15:15:33 GLaDOS kernel: res 41/84:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)

Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }

Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }

Replace the SATA cable for cache2 and post new diags after array start, but the pool will need to be fixed again.

Quote

November 2, 2025Nov 2

Author

@JorgeB I finally got around to replacing the cable due to just having time etc,

Server is showing the drive and also showing a missing drive on the cache,

I don't seem to see similar errors but I could just be looking at the wrong spot on the logs, can you confirm the logs look correct and whats next to clear this error and re-make the cache?

glados-diagnostics-20251102-1717.zip

Quote

November 2, 2025Nov 2

With the array running type

btrfs balance start -f -dconvert=single -mconvert=dup /mnt/cache

then

btrfs device remove missing /mnt/cache

then post new diags.

Quote

November 2, 2025Nov 2

Author

@JorgeB I did both but I did get an error but continued as there is plenty of space on the drive currently

btrfs balance start -f -dconvert=single -mconvert=dup /mnt/cache
ERROR: error during balancing '/mnt/cache': No space left on device
There may be more info in syslog - try dmesg | tail
btrfs device remove missing /mnt/cache

See attached updated diag also

glados-diagnostics-20251103-0959.zip

Quote

November 3, 2025Nov 3

Solution

Now reimport the pool as a single device:

on main click on the first device for that pool and then "remove pool"

back on main, create a new pool with the same name and 1 slot

assign the pool device, leave the filesystem set to auto

start the array to import the pool

You can then add a second drive to create a mirror if that is the intention.

Quote

November 3, 2025Nov 3

Author

Looks like we are finally back in business It seems to be running a balance but most important its writing to the drive again!

Goes to show the cables I had in them weren't as long lasting as I would have hoped, cant say I have had a dead sata cable on me before :/

Thanks again @JorgeB Ill let it run its thing and then I can finally get back onto updating it from 7 to the latest, I was wanting this fixed first before heading down the update route

Edited November 3, 2025Nov 3 by brent3000

Quote

1

Log Filling and access beyond end of device errors

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)