Griminal Posted February 1 Share Posted February 1 (edited) I posted a few days ago. (link below) I ran in emulated mode for a few days. The new drive arrived, I shut down the system and inserted the new drives and selected one of them as DISK1. Started up the array and the re-build was in process like I expected. I even started throwing some of my backup-ed files back on the array as the parity check was going. I went to bed. I check it this AM, all my shares are missing.... I browse DISK1 and DISK2 now, and it looks like the root of a *nix system! (see screenshot) I'm beside myself. I replaced a dozen drives in the same way and I just don't understand. I just don't know what to do anymore. I've stop the array rebuild and won't touch the system until someone gives me some more guidance. I have 3 non-array members. I have two internal drives in the system, a brand new one ZR5F463E that I haven't formatted yet. ZL2LCCS2 was the original drive with CRC errors that I don't have mounted, but left in the slot. Z84109XN is one of the backup drives I had mounted to copy some data back to the array. I don't know what's happening. I'm quickly losing faith in my build. Here's my previous post: hyde-diagnostics-20240201-0813.zip Edited February 1 by Griminal Quote Link to comment
JorgeB Posted February 1 Share Posted February 1 Check filesystem on disk1, run it without -n. Quote Link to comment
Griminal Posted February 1 Author Share Posted February 1 9 minutes ago, JorgeB said: Check filesystem on disk1, run it without -n. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
Griminal Posted February 1 Author Share Posted February 1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata sb_fdblocks 1771110769, counted 1775009412 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 4 - agno = 7 - agno = 2 - agno = 6 - agno = 8 - agno = 9 - agno = 5 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 1 - agno = 15 - agno = 14 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (694513559:307199) is ahead of log (1:2). Format log to cycle 694513562. done Quote Link to comment
trurl Posted February 1 Share Posted February 1 Start the array in normal (not maintenance mode) and post new diagnostics. Quote Link to comment
Griminal Posted February 1 Author Share Posted February 1 Done. I paused the re-build. Shares, docker, and VMs are back. I'm scared to touch anything.... Diagnostics posted. hyde-diagnostics-20240201-1104.zip Quote Link to comment
trurl Posted February 1 Share Posted February 1 Looks like you were having connection problems with disk2, so that has probably caused problems emulating and trying to rebuild disk1. Quote Link to comment
trurl Posted February 1 Share Posted February 1 You have filled up log space, so we can't tell anything about what is happening now or in the future. You should reboot to clear that out. Quote Link to comment
Griminal Posted February 1 Author Share Posted February 1 (edited) What would have filled up the logs in a 12 hour period? Maybe my LSI card is having issues? Maybe the breakout cable is going? What recommendations do you have for me to go forward after I reboot? Edited February 1 by Griminal Quote Link to comment
trurl Posted February 1 Share Posted February 1 8 minutes ago, Griminal said: What would have filled up the logs in a 12 hour period? 15 minutes ago, trurl said: connection problems with disk2 Quote Link to comment
Griminal Posted February 2 Author Share Posted February 2 So I moved Disk2 to a mobo SATA port, away from the LSI controller. Parity is at ~20% at this time. I'm keeping an active browser window up to capture the logs. I'm seeing this thus far. Its been rebuilding for 5 hours. Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Feb 1 18:26:50 hyde kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Feb 1 18:26:55 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred Feb 1 18:26:56 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred Quote Link to comment
JorgeB Posted February 2 Share Posted February 2 9 hours ago, Griminal said: Feb 1 18:26:55 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred Feb 1 18:26:56 hyde kernel: sd 9:0:4:0: Power-on or device reset occurred These usually mean a power/connection problem with that device. Quote Link to comment
Griminal Posted February 16 Author Share Posted February 16 I ended up downgrading to Version: 6.12.4 after the file check. No more problems so far. Quote Link to comment
trurl Posted February 16 Share Posted February 16 Do you have a lost+found share from the earlier repair? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.