Evedoescomputerstuff Posted October 7, 2020 Share Posted October 7, 2020 Hey, I had an unclean shutdown due to a power outage and an 8TB XFS disk I use with unassigned devices and mergerfs refuses to mount. Yesterday I ran xfs_repair on it and eventually after scanning I got Sorry, could not find valid secondary superblock Exiting now. Mounting throws these errors into the log Oct 7 08:58:42 Tower kernel: XFS (sdp1): log mount/recovery failed: error -117 Oct 7 08:58:42 Tower kernel: XFS (sdp1): log mount failed Oct 7 08:58:42 Tower unassigned.devices: Mount of '/dev/sdp1' failed. Error message: mount: /mnt/disks/media36: mount(2) system call failed: Structure needs cleaning. Oct 7 08:58:42 Tower unassigned.devices: Partition 'ST8000DM004-2CX188_ZCT1C93J' could not be mounted. Running the file system check from webgui returns something like FS: xfs /sbin/xfs_repair -n /dev/sdp1 2>&1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_ifree 3094, counted 3087 sb_fdblocks 266597848, counted 271434282 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 imap claims a free inode 119939202 is in use, would correct imap and clear inode - agno = 1 - agno = 2 Metadata CRC error detected at 0x45c2d9, xfs_dir3_data block 0x1802bbfe0/0x1000 bad directory block magic # 0x241a9c92 in block 10 for directory inode 4294967424 corrupt block 10 in directory inode 4294967424 would junk block Metadata corruption detected at 0x435c33, xfs_inode block 0x1802bbfe8/0x4000 Metadata corruption detected at 0x435c33, xfs_inode block 0x1802bc008/0x4000 Metadata corruption detected at 0x435c33, xfs_inode block 0x1802bc028/0x4000 Metadata corruption detected at 0x435c33, xfs_inode block 0x1802bc048/0x4000 - agno = 3 bad CRC for inode 6445318144 bad magic number 0x241a on inode 6445318144 bad version number 0x6d on inode 6445318144 inode identifier 1575615604729810958 mismatch on inode 6445318144 bad CRC for inode 6445318145 bad magic number 0x241a on inode 6445318145 bad version number 0x6d on inode 6445318145 inode identifier 1575615604729810958 mismatch on inode 6445318145 bad CRC for inode 6445318146 bad magic number 0x241a on inode 6445318146 bad version number 0x6d on inode 6445318146 inode identifier 1575615604729810958 mismatch on inode 6445318146 bad CRC for inode 6445318147 bad magic number 0x241a on inode 6445318147 bad version number 0x6d on inode 6445318147 inode identifier 1575615604729810958 mismatch on inode 6445318147 bad CRC for inode 6445318148 bad magic number 0x241a on inode 6445318148 bad version number 0x6d on inode 6445318148 inode identifier 1575615604729810958 mismatch on inode 6445318148 bad CRC for inode 6445318149 bad magic number 0x241a on inode 6445318149 bad version number 0x6d on inode 6445318149 inode identifier 1575615604729810958 mismatch on inode 6445318149 bad CRC for inode 6445318150 bad magic number 0x241a on inode 6445318150 bad version number 0x6d on inode 6445318150 Eventually leading to xfs_repair: read failed: Input/output error bad magic # 0 in inode 8594108579 (data fork) bmbt block 887796852 bad data fork in inode 8594108579 would have cleared inode 8594108579 bad nblocks 4162550 for inode 8595177059, would reset to 4162374 bad nblocks 4022773 for inode 8595177060, would reset to 4022594 - agno = 5 - agno = 6 What are my options? Id love to figure out a way to repair it without losing the data on it and not needing to pull it. Barring that I guess its pull the disk, buy another 8TB and try to copy the data or just wipe it and start fresh? I'd lose a bunch of torrents but I'm pretty sure it's just all long term seed stuff I have cloud backups of obviously. Quote Link to comment
JorgeB Posted October 7, 2020 Share Posted October 7, 2020 4 minutes ago, Evedoescomputerstuff said: xfs_repair: read failed: Input/output error This suggests a disk problem, please post SMART report. Quote Link to comment
Evedoescomputerstuff Posted October 7, 2020 Author Share Posted October 7, 2020 22 minutes ago, JorgeB said: This suggests a disk problem, please post SMART report. smartctl -a /dev/sdp smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda Compute Device Model: ST8000DM004-2CX188 Serial Number: ZCT1C93J LU WWN Device Id: 5 000c50 0c33b4041 Firmware Version: 0001 User Capacity: 8,001,563,222,016 bytes [8.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5425 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Oct 7 09:30:08 2020 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: (1004) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x30a5) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 074 053 006 Pre-fail Always - 118910377 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 436 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 083 061 045 Pre-fail Always - 197121477 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6622 (10 66 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 105 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 023 023 000 Old_age Always - 77 188 Command_Timeout 0x0032 100 018 000 Old_age Always - 1643755 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 061 039 040 Old_age Always In_the_past 39 (Min/Max 31/41 #1) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 169 193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6768 194 Temperature_Celsius 0x0022 039 061 000 Old_age Always - 39 (0 20 0 0 0) 195 Hardware_ECC_Recovered 0x001a 081 064 000 Old_age Always - 118910377 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 24 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 24 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 4455 (233 194 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 130086748120 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 192229983405 SMART Error Log Version: 1 ATA Error Count: 72 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 72 occurred at disk power-on lifetime: 6621 hours (275 days + 21 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 17:30:47.161 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:47.120 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:47.120 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:47.120 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:47.119 READ FPDMA QUEUED Error 71 occurred at disk power-on lifetime: 6621 hours (275 days + 21 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 17:30:11.115 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:11.102 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:11.090 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:11.074 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 17:30:11.055 READ FPDMA QUEUED Error 70 occurred at disk power-on lifetime: 6620 hours (275 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 16:50:14.086 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:50:14.046 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:50:14.045 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:50:14.045 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:50:14.045 READ FPDMA QUEUED Error 69 occurred at disk power-on lifetime: 6620 hours (275 days + 20 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 16:49:37.963 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:49:37.951 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:49:37.939 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:49:37.922 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 16:49:37.904 READ FPDMA QUEUED Error 68 occurred at disk power-on lifetime: 6610 hours (275 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 05:57:40.379 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 05:57:40.372 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 05:57:40.366 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 05:57:22.568 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 05:57:22.561 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 778 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. So there are some errors lol I mean I don't want to retire it since I frankly dont give a fuck about the data on the drives as long as theres nothing under 24 hours old that gets lost (all my shit backs up to GSuite) but that is kind of worrisome. Quote Link to comment
JorgeB Posted October 7, 2020 Share Posted October 7, 2020 Yep, disk is failing, and xfs_repair will abort on a read error, best bet is to clone it with ddrescue then run xfs_repair. Quote Link to comment
Evedoescomputerstuff Posted October 7, 2020 Author Share Posted October 7, 2020 (edited) 3 minutes ago, JorgeB said: Yep, disk is failing, and xfs_repair will abort on a read error, best bet is to clone it with ddrescue then run xfs_repair. I guess theres some 12TB externals on sale, I'll grab one. I wonder if its even worth it to waste my time repairing it? I'll probably try to mount it on another machine (there's some software which claims it'll let me browse an unmountable xfs disk) just to make sure there isn't anything terribly important on there but I doubt it. Edited October 7, 2020 by Evedoescomputerstuff Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.