pool scrub instantly aborts for 1 pool but works for other 3


Recommended Posts

Cache pool finished scrub correcting 2 errors, 2 other pools are still running with ~60 hours left and found no errors yet...but 1 pool just instantly aborts scrub. Tried it from web interface with reapir checked and nothing in log but seeing the command run, same thing when  btrfs scrub status /mnt/archive_one is run from command. But if i run it from web interface with repair unchecked, i get the following in the log and it still goes right to aborted.

Quote

Aug  3 23:44:25 Raza ool www[1901]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/archive_one' '-r'
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 1
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 2
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 1 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 2 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 4
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 5
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 4 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 5 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 8
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 8 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 7
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 7 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 3
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 3 with status: -30
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: started on devid 6
Aug  3 23:44:25 Raza kernel: BTRFS info (device sdq1): scrub: not finished on devid 6 with status: -30


 

Link to comment

Rather strange, the pool was working correctly without any errors logged before this boot, then out of the blue it started detecting checksum and other errors on two different devices, suspect it might be related to some of the known issues that still exist with raid5/6, most serious of these are fixed starting with kernel 5.20.

 

I could be pain but I would suggest copying all the data to a different place then re-format.

Link to comment
7 minutes ago, JorgeB said:

without any errors logged before this boot...

with this last boot (well multiple when troubleshooting) pulseway stopped working and so then multiple reboots trying to fix/update it. Currently just saying screw it will fix pulseway later. I would assume no relation just coincidence in timing. I assume these errors would not be a "drive tray needs to be reseated into DAS" type of issue? How can i move the data off without repairing it first, or is it like it just reads from the "good" copy and ignores the errors if any (then deletes file from move) and so no prob to move? I see unraid 6.10 brings kernel up to 5.15, guessing 5.20 is a ways off then.

 

i know server is a mess, the 3 non-cache pools are just 24tb overflows (for specific uses) to keep the main array cleaner (and over drive limit). I know i'm at the size where i need a second unraid or other server to handle all the categorical archive stuff, but that is not feasible right now so its a down the line thing (still looking at you 45 drives).

Link to comment
1 hour ago, Cull2ArcaHeresy said:

I assume these errors would not be a "drive tray needs to be reseated into DAS" type of issue?

Doesn't look like it.

 

1 hour ago, Cull2ArcaHeresy said:

How can i move the data off without repairing it first, or is it like it just reads from the "good" copy and ignores the errors if any (then deletes file from move) and so no prob to move?

Use the disk paths and it will give you an i/o error for any corrupt file, then restore those from backups if available.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.