pool scrub instantly aborts for 1 pool but works for other 3

Cull2ArcaHeresy · August 4, 2022

Cache pool finished scrub correcting 2 errors, 2 other pools are still running with ~60 hours left and found no errors yet...but 1 pool just instantly aborts scrub. Tried it from web interface with reapir checked and nothing in log but seeing the command run, same thing when btrfs scrub status /mnt/archive_one is run from command. But if i run it from web interface with repair unchecked, i get the following in the log and it still goes right to aborted.

Quote

Aug 3 23:44:25 Raza ool www[1901]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/archive_one' '-r'
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 1
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 2
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 1 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 2 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 4
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 5
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 4 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 5 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 8
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 8 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 7
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 7 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 3
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 3 with status: -30
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 6
Aug 3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 6 with status: -30

JorgeB · August 4, 2022

Please post the diagnostics.

Cull2ArcaHeresy · August 4, 2022

knew something was missing

raza-diagnostics-20220804-0525.zip

JorgeB · August 4, 2022

Rather strange, the pool was working correctly without any errors logged before this boot, then out of the blue it started detecting checksum and other errors on two different devices, suspect it might be related to some of the known issues that still exist with raid5/6, most serious of these are fixed starting with kernel 5.20.

I could be pain but I would suggest copying all the data to a different place then re-format.

Cull2ArcaHeresy · August 4, 2022

7 minutes ago, JorgeB said:

without any errors logged before this boot...

with this last boot (well multiple when troubleshooting) pulseway stopped working and so then multiple reboots trying to fix/update it. Currently just saying screw it will fix pulseway later. I would assume no relation just coincidence in timing. I assume these errors would not be a "drive tray needs to be reseated into DAS" type of issue? How can i move the data off without repairing it first, or is it like it just reads from the "good" copy and ignores the errors if any (then deletes file from move) and so no prob to move? I see unraid 6.10 brings kernel up to 5.15, guessing 5.20 is a ways off then.

i know server is a mess, the 3 non-cache pools are just 24tb overflows (for specific uses) to keep the main array cleaner (and over drive limit). I know i'm at the size where i need a second unraid or other server to handle all the categorical archive stuff, but that is not feasible right now so its a down the line thing (still looking at you 45 drives).

JorgeB · August 4, 2022

1 hour ago, Cull2ArcaHeresy said:

I assume these errors would not be a "drive tray needs to be reseated into DAS" type of issue?

Doesn't look like it.

1 hour ago, Cull2ArcaHeresy said:

How can i move the data off without repairing it first, or is it like it just reads from the "good" copy and ignores the errors if any (then deletes file from move) and so no prob to move?

Use the disk paths and it will give you an i/o error for any corrupt file, then restore those from backups if available.

pool scrub instantly aborts for 1 pool but works for other 3

Recommended Posts

Cull2ArcaHeresy

Link to comment

JorgeB

Link to comment

Cull2ArcaHeresy

Link to comment

JorgeB

Link to comment

Cull2ArcaHeresy

Link to comment

JorgeB

Link to comment

Join the conversation