Help! I don't know what to do. Never had this problem before. Shares gone!

February 1, 20242 yr

I posted a few days ago. (link below) I ran in emulated mode for a few days. The new drive arrived, I shut down the system and inserted the new drives and selected one of them as DISK1. Started up the array and the re-build was in process like I expected. I even started throwing some of my backup-ed files back on the array as the parity check was going. I went to bed. I check it this AM, all my shares are missing.... I browse DISK1 and DISK2 now, and it looks like the root of a *nix system! (see screenshot)

I'm beside myself. I replaced a dozen drives in the same way and I just don't understand. I just don't know what to do anymore. I've stop the array rebuild and won't touch the system until someone gives me some more guidance.

I have 3 non-array members. I have two internal drives in the system, a brand new one ZR5F463E that I haven't formatted yet. ZL2LCCS2 was the original drive with CRC errors that I don't have mounted, but left in the slot. Z84109XN is one of the backup drives I had mounted to copy some data back to the array.

I don't know what's happening. I'm quickly losing faith in my build.

Here's my previous post:

image.png.0df7981ea96c45fa40716f8593dae6be.png

hyde-diagnostics-20240201-0813.zip

Edited February 1, 20242 yr by Griminal

Quote

February 1, 20242 yr

Community Expert

Check filesystem on disk1, run it without -n.

Quote

February 1, 20242 yr

Author

9 minutes ago, JorgeB said:

Check filesystem on disk1, run it without -n.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Quote

February 1, 20242 yr

Community Expert
Solution

Use -L

Quote

February 1, 20242 yr

Author

Phase 1 - find and verify superblock...

Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_fdblocks 1771110769, counted 1775009412
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 4
        - agno = 7
        - agno = 2
        - agno = 6
        - agno = 8
        - agno = 9
        - agno = 5
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 1
        - agno = 15
        - agno = 14
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (694513559:307199) is ahead of log (1:2).
Format log to cycle 694513562.
done

Quote

February 1, 20242 yr

Community Expert

Start the array in normal (not maintenance mode) and post new diagnostics.

Quote

February 1, 20242 yr

Author

Done. I paused the re-build. Shares, docker, and VMs are back. I'm scared to touch anything.... Diagnostics posted.

hyde-diagnostics-20240201-1104.zip

Quote

February 1, 20242 yr

Community Expert

Looks like you were having connection problems with disk2, so that has probably caused problems emulating and trying to rebuild disk1.

Quote

February 1, 20242 yr

Community Expert

You have filled up log space, so we can't tell anything about what is happening now or in the future. You should reboot to clear that out.

Quote

February 1, 20242 yr

Author

What would have filled up the logs in a 12 hour period? Maybe my LSI card is having issues? Maybe the breakout cable is going? What recommendations do you have for me to go forward after I reboot?

Edited February 1, 20242 yr by Griminal

Quote

February 1, 20242 yr

Community Expert

8 minutes ago, Griminal said:

What would have filled up the logs in a 12 hour period?

15 minutes ago, trurl said:

connection problems with disk2

Quote

February 2, 20242 yr

Author

So I moved Disk2 to a mobo SATA port, away from the LSI controller. Parity is at ~20% at this time. I'm keeping an active browser window up to capture the logs. I'm seeing this thus far. Its been rebuilding for 5 hours.

Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Feb 1 18:26:55 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred
Feb 1 18:26:56 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred

Quote

February 2, 20242 yr

Community Expert

9 hours ago, Griminal said:

Feb 1 18:26:55 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred
Feb 1 18:26:56 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred

These usually mean a power/connection problem with that device.

Quote

February 16, 20242 yr

Author

I ended up downgrading to Version: 6.12.4 after the file check. No more problems so far.

Quote

February 16, 20242 yr

Community Expert

Do you have a lost+found share from the earlier repair?

Quote

Help! I don't know what to do. Never had this problem before. Shares gone!

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)