Unable to write to disk10

Nomar1245 · September 5

I keep having shares disappear within about an hour of starting the array. Same thing happens after reboot. After a period of time my disk10 shows the following instead of shares. Its the only disk that does this which was replaced about a month ago, but this has only started happening in the last few weeks:

image.png.5437e10d814faa29e59709ad4d89c993.png

My troubleshooting:

I've disabled all of my docker containers except 2 that I've literally been using for years.

I've removed priviledge access where it was enabled from dockers.

I've run a check disk via Maintenance Mode.

I've run multiple parity checks. The first fixed ~3300 errors, and the subsequent ones have not found any.

Reboot solves it temporarily

Stopping and starting the array solves it temporarily.

I'm getting a replacement drive just in case, but I thought it would be silly to not explore all problem points while I wait for it to arrive.

kong-diagnostics-20240905-1215.zip

JorgeB · September 5

Check filesystem on disk10, run it without -n

Nomar1245 · September 5

Running now.

Nomar1245 · September 5

This is the result, almost immediately:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Smart Erro Log Shows:

No Errors Logged

Smart Short Test returned:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       213         -
# 2  Short offline       Completed without error       00%       158         -

Edited September 5 by Nomar1245

Nomar1245 · September 5

Running now with -L instead

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
agi_freecount 285, counted 286 in ag 3
agi_freecount 285, counted 287 in ag 3 finobt
sb_ifree 2988, counted 2989
sb_fdblocks 2269434669, counted 2296675626
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 5
        - agno = 2
        - agno = 11
        - agno = 4
        - agno = 6
        - agno = 8
        - agno = 7
        - agno = 10
        - agno = 9
        - agno = 3
        - agno = 1
        - agno = 12
        - agno = 13
        - agno = 14
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:2155061) is ahead of log (1:2).
Format log to cycle 4.
done

It's been ~30 minutes and everything looks to be good. I'll follow up later this evening to confirm all is well. Thanks.

Edited September 5 by Nomar1245

itimpi · September 5

The disk should be mounting now?

BTW: The Short SMART test is not a good indicator of a disks health (although if it fails the disk definitely needs replacing).

Nomar1245 · September 6

Everything has been fine for about 3 hours now which is longest it has working in about 2 weeks. I think this has been solved. Thank you.

Unable to write to disk10

Recommended Posts

Nomar1245

Link to comment

JorgeB

Link to comment

Nomar1245

Link to comment

Nomar1245

Link to comment

Nomar1245

Link to comment

itimpi

Link to comment

Nomar1245

Link to comment

Join the conversation