Unable to write to disk10

Followers

September 5, 20241 yr

I keep having shares disappear within about an hour of starting the array. Same thing happens after reboot. After a period of time my disk10 shows the following instead of shares. Its the only disk that does this which was replaced about a month ago, but this has only started happening in the last few weeks:

image.png.5437e10d814faa29e59709ad4d89c993.png

My troubleshooting:

I've disabled all of my docker containers except 2 that I've literally been using for years.

I've removed priviledge access where it was enabled from dockers.

I've run a check disk via Maintenance Mode.

I've run multiple parity checks. The first fixed ~3300 errors, and the subsequent ones have not found any.

Reboot solves it temporarily

Stopping and starting the array solves it temporarily.

I'm getting a replacement drive just in case, but I thought it would be silly to not explore all problem points while I wait for it to arrive.

kong-diagnostics-20240905-1215.zip

Quote

Solved by JorgeB

September 5, 20241 yr

Go to solution

September 5, 20241 yr

Community Expert
Solution

Check filesystem on disk10, run it without -n

Quote

September 5, 20241 yr

Author

Running now.

Quote

September 5, 20241 yr

Author

This is the result, almost immediately:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Smart Erro Log Shows:

No Errors Logged

Smart Short Test returned:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       213         -
# 2  Short offline       Completed without error       00%       158         -

Edited September 5, 20241 yr by Nomar1245

Quote

September 5, 20241 yr

Author

Running now with -L instead

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
agi_freecount 285, counted 286 in ag 3
agi_freecount 285, counted 287 in ag 3 finobt
sb_ifree 2988, counted 2989
sb_fdblocks 2269434669, counted 2296675626
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 5
        - agno = 2
        - agno = 11
        - agno = 4
        - agno = 6
        - agno = 8
        - agno = 7
        - agno = 10
        - agno = 9
        - agno = 3
        - agno = 1
        - agno = 12
        - agno = 13
        - agno = 14
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:2155061) is ahead of log (1:2).
Format log to cycle 4.
done

It's been ~30 minutes and everything looks to be good. I'll follow up later this evening to confirm all is well. Thanks.

Edited September 5, 20241 yr by Nomar1245

Quote

September 5, 20241 yr

Community Expert

The disk should be mounting now?

BTW: The Short SMART test is not a good indicator of a disks health (although if it fails the disk definitely needs replacing).

Quote

September 6, 20241 yr

Author

Everything has been fine for about 3 hours now which is longest it has working in about 2 weeks. I think this has been solved. Thank you.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Unable to write to disk10

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)