September 11, 2025Sep 11 Hi Team,Hope someone can help me I noticed some of my dockers were running abit slower than normal and I found my logs were at 100%, strange so I found a thread about clearing that and then when i looked as my syslog files I found alot of 'access beyond end of device' seems to be filling these up,When reading some of the issues seem to point to a dead drive but it still seems to be showing as normal in the system so not sure if its dead or something else causing issues?Attached is the diag file if someone can provide some guidance :)glados-diagnostics-20250911-1039.zipSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#10 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#11 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#12 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#13 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#14 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#15 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#16 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#17 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#18 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#19 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#20 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#21 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#22 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#23 access beyond end of deviceSep 11 08:20:32 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#31 access beyond end of device Edited September 11, 2025Sep 11 by brent3000
September 11, 2025Sep 11 Syslog already rotated, but looks like cache2 dropped offline, power down, check/replace its cables, and post new diags after array start.
September 14, 2025Sep 14 Author I did do a reboot and it seems the logs arnt filling up as bad as they were, both the cache drives show as online but it seems like one just isnt writing? Do I need to make them re-sync or something or should it have self corrected by now?
September 15, 2025Sep 15 Author Sorry missed that portion :)See attached,glados-diagnostics-20250915-1255.zip
September 15, 2025Sep 15 Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack-- --------- ------- -------- --------- ----------- ------- ----- 3 /dev/sdc1 1.48TiB 4.00GiB 64.00MiB 340.95GiB 1.82TiB - 4 missing 1.48TiB 4.00GiB 64.00MiB 340.95GiB 1.82TiB --- --------- ------- -------- --------- ----------- ------- ----- Total 1.48TiB 4.00GiB 64.00MiB 681.91GiB 3.64TiB 0.00B Used 1.04TiB 2.67GiB 256.00KiB Only cache1 is currently part of the pool; reimport the pool with just that device.on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 1 slotassign the pool device (sdc), leave the filesystem set to autostart the array to import the pool and post new diags
September 18, 2025Sep 18 Author Followed and got some errors showing like this now glados-diagnostics-20250918-1348.zipI did select both the drives with sdc in slot 1 and sdd in slot 2 and the it moved it to un-assigned drives
September 18, 2025Sep 18 3 hours ago, brent3000 said:I did select both the drives with sdc in slot 1 and sdd in slot 2 and the it moved it to un-assigned drivesThat's not what I had asked, in this case it doesn't matter, but please follow the instructiopns:With the array running type:btrfs balance start -dconvert=single -mconvert=dup /mnt/cacheWhen that finishes, typebtrfs device remove missing /mnt/cachePost new diagnostics after that.
September 18, 2025Sep 18 Author Ok I did get upto the last parts and oddly with over 50% free space I did get this error but it did most of it as it was down to the final 1% during the balance but still had plenty of storage leftERROR: error during balancing '/mnt/cache': No space left on deviceThere may be more info in syslog - try dmesg | tailI did run the other command tho and I didnt get any response back after running the command so I assume it completed as intended?see attachedglados-diagnostics-20250919-0024.zip
September 18, 2025Sep 18 Yes, pool is now single; you need to reimport it once more:on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 1 slotassign the pool device (sdc), leave the filesystem set to autostart the array to import the pool and post new diagsThe pool should now import with just one device, if yes, stop the array, change slots to 2, add the other device, start the array, and it should create a mirrored pool.
September 18, 2025Sep 18 Author I see the part I missed which was the 1 slot section, my bad,See attached,glados-diagnostics-20250919-0835.zipThis was then running with 1 drive, I then stopped the array changed to 2 and selected sdd into the pool and the below is that diag.It did say it would wipe the 2nd drive (which I assume would be normal to setup back into the pool format)glados-diagnostics-20250919-0839.zipI also got this error do I need to re-balance? It does say there is a 'BTRFS operation is running' so I should leave it to re-setup itself?Once its all back online I assume I can remove the Historical unassigned drive which is showing as the drive?Adding some more info, seems the drive is in standby mode and logs are re-filling up again, is there a possible drive issue? Logs seem to be getting the same issue againSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#0 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#1 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#2 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#3 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#4 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#5 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#6 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#7 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#8 access beyond end of deviceSep 19 08:52:50 GLaDOS kernel: sd 3:0:0:0: [sdd] tag#9 access beyond end of device Edited September 19, 2025Sep 19 by brent3000
September 19, 2025Sep 19 Sep 19 08:37:52 GLaDOS kernel: ata2.00: qc timeout after 30000 msecs (cmd 0xec)Sep 19 08:37:52 GLaDOS kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)Sep 19 08:37:52 GLaDOS kernel: ata2.00: revalidation failed (errno=-5)Sep 19 08:37:52 GLaDOS kernel: ata2.00: disable deviceCache2 dropped offline again, if you haven't yet, replace both cables, if you did, it may be a bad device, but most likely it's a power/connection issue.
September 19, 2025Sep 19 Author It was online during the time I was testing it before, would there be a SMART test I can try and validate it with?In the interim can I do everything you said above until the part of adding it into the pool again? just to stop the errors?aka this portion?18 hours ago, JorgeB said:on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 1 slotassign the pool device (sdc), leave the filesystem set to autostart the array to import the pool and post new diags Edited September 19, 2025Sep 19 by brent3000
September 19, 2025Sep 19 1 hour ago, brent3000 said:would there be a SMART test I can try and validate it with?SMART test won't work once a device drops.Yes, you shoudl be able to reimport the pool with the remaining device.
September 19, 2025Sep 19 Author Thanks for the details, give me some time to do some more testing, I may take the drive out and see how it runs on a bench system as a quick validate if the drive is causing the issues as I have moved it around in the system and the port remains working for other drives so not sure if its the port or something else, Will report back, for now I have rebooted and the drive is back online
September 21, 2025Sep 21 Author Seems like both are online does this look right? glados-diagnostics-20250921-1516.zipJust interesting that there isnt any write actions on the other drive just yet
September 21, 2025Sep 21 The device is currently online, but it's not part of the pool, and there are still what look like bad SATA cables errors logged:Sep 21 15:15:33 GLaDOS kernel: ata2: log page 10h reported inactive tag 21Sep 21 15:15:33 GLaDOS kernel: ata2.00: exception Emask 0x1 SAct 0x1c00000 SErr 0x400001 action 0x6Sep 21 15:15:33 GLaDOS kernel: ata2.00: irq_stat 0x40000008Sep 21 15:15:33 GLaDOS kernel: ata2: SError: { RecovData Handshk }Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUEDSep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/30:b0:48:08:00/00:00:00:00:00/40 tag 22 ncq dma 24576 inSep 21 15:15:33 GLaDOS kernel: res 41/84:01:06:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUEDSep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/78:b8:88:08:00/00:00:00:00:00/40 tag 23 ncq dma 61440 inSep 21 15:15:33 GLaDOS kernel: res 41/84:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }Sep 21 15:15:33 GLaDOS kernel: ata2.00: failed command: READ FPDMA QUEUEDSep 21 15:15:33 GLaDOS kernel: ata2.00: cmd 60/f8:c0:08:09:00/00:00:00:00:00/40 tag 24 ncq dma 126976 inSep 21 15:15:33 GLaDOS kernel: res 41/84:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)Sep 21 15:15:33 GLaDOS kernel: ata2.00: status: { DRDY ERR }Sep 21 15:15:33 GLaDOS kernel: ata2.00: error: { ICRC ABRT }Replace the SATA cable for cache2 and post new diags after array start, but the pool will need to be fixed again.
November 2, 2025Nov 2 Author @JorgeB I finally got around to replacing the cable due to just having time etc, Server is showing the drive and also showing a missing drive on the cache, I don't seem to see similar errors but I could just be looking at the wrong spot on the logs, can you confirm the logs look correct and whats next to clear this error and re-make the cache? glados-diagnostics-20251102-1717.zip
November 2, 2025Nov 2 With the array running typebtrfs balance start -f -dconvert=single -mconvert=dup /mnt/cachethenbtrfs device remove missing /mnt/cachethen post new diags.
November 2, 2025Nov 2 Author @JorgeB I did both but I did get an error but continued as there is plenty of space on the drive currentlybtrfs balance start -f -dconvert=single -mconvert=dup /mnt/cache ERROR: error during balancing '/mnt/cache': No space left on device There may be more info in syslog - try dmesg | tail btrfs device remove missing /mnt/cacheSee attached updated diag also glados-diagnostics-20251103-0959.zip
November 3, 2025Nov 3 Solution Now reimport the pool as a single device:on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 1 slotassign the pool device, leave the filesystem set to autostart the array to import the poolYou can then add a second drive to create a mirror if that is the intention.
November 3, 2025Nov 3 Author Looks like we are finally back in business It seems to be running a balance but most important its writing to the drive again! Goes to show the cables I had in them weren't as long lasting as I would have hoped, cant say I have had a dead sata cable on me before :/ Thanks again @JorgeB Ill let it run its thing and then I can finally get back onto updating it from 7 to the latest, I was wanting this fixed first before heading down the update route Edited November 3, 2025Nov 3 by brent3000
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.