mattcrem Posted May 15, 2021 Share Posted May 15, 2021 (edited) So I woke up today to two failed drives as per the below screenshot. The situation is as follows: Disk 3 - marked "unmountable: not mounted: I have connected it via USB HDD dock to my windows laptop. Both FTK Imager and DiskInternals LinuxReader are able to access the drive and show me the folders and files on it, although not sure if the data is actually readable, but folder structure seems intact. Disk 4 - Shown as failedBoth FTK and linux reader are showing me the filestructure of this drive. Since there are two failed drives, I can't mount the array, not even in maintenance mode. So what are my options here to try and retrieve the data and rebuild the array? Edited May 15, 2021 by mattcrem updated Images Quote Link to comment
Frank1940 Posted May 15, 2021 Share Posted May 15, 2021 First thing which the Gurus (I am not even close to one with this type of problem!) are going to need is a Diagnostics File. Tools >>> Diagnostics Post it up in a new Post. (No one will know that you did it if you add to your first post by editing it!) 1 Quote Link to comment
mattcrem Posted May 15, 2021 Author Share Posted May 15, 2021 48 minutes ago, Frank1940 said: First thing which the Gurus (I am not even close to one with this type of problem!) are going to need is a Diagnostics File. Tools >>> Diagnostics Post it up in a new Post. (No one will know that you did it if you add to your first post by editing it!) you're right, thanks for reminding me, attaching diagnostics: cremonanas-diagnostics-20210515-1749.zip Quote Link to comment
JorgeB Posted May 16, 2021 Share Posted May 16, 2021 Run a filesystem check on disk1, disk4 appears to be really failing, you can run an extended SMART test to confirm, if it fails replace. Quote Link to comment
sota Posted May 16, 2021 Share Posted May 16, 2021 Here's my suggestion: Since you can see the contents of the disks on another system, copy a couple items out to spot check their viability. If good, copy everything/clone the drive(s) to new one(s). Now you'll at least have a copy. Once that's done, run extended SMART tests on both to see what their actual condition is. @JorgeB I looked at the diags, and can see where disk1 is poo, but disk4 appears to mount fine... unless i'm reading the log wrong. Quote Link to comment
JorgeB Posted May 16, 2021 Share Posted May 16, 2021 6 minutes ago, sota said: @JorgeB I looked at the diags, and can see where disk1 is poo, but disk4 appears to mount fine... unless i'm reading the log wrong. Don't understand what you mean, disk1 is unmountable but enable, disk4 is disable but mounting. Quote Link to comment
mattcrem Posted May 17, 2021 Author Share Posted May 17, 2021 On 5/16/2021 at 12:49 PM, sota said: Here's my suggestion: Since you can see the contents of the disks on another system, copy a couple items out to spot check their viability. If good, copy everything/clone the drive(s) to new one(s). Now you'll at least have a copy. Once that's done, run extended SMART tests on both to see what their actual condition is. @JorgeB I looked at the diags, and can see where disk1 is poo, but disk4 appears to mount fine... unless i'm reading the log wrong. Disk 1 happens to have just media files. I connected the disk directly to a windows PC and with FTK imager exported some of them, however neither media player classic nor VLC are able to playback these files. I have now remounted everything in my unraid server and using teracopy to move everything to a local hard disk and waiting to see if this allows me to open the files this way. Disk 4 was just added to the array but I never actually had anything stored on it if that helps. On 5/16/2021 at 9:35 AM, JorgeB said: Run a filesystem check on disk1, disk4 appears to be really failing, you can run an extended SMART test to confirm, if it fails replace. I have not done this yet, however checking dmesg I see a long list of bad sectors in disk1. Quote Link to comment
JorgeB Posted May 17, 2021 Share Posted May 17, 2021 48 minutes ago, mattcrem said: I have not done this yet, however checking dmesg I see a long list of bad sectors in disk1. There's no SMART report for disk1 on the diags, see if you can get one manually. Quote Link to comment
mattcrem Posted May 18, 2021 Author Share Posted May 18, 2021 23 hours ago, JorgeB said: There's no SMART report for disk1 on the diags, see if you can get one manually. smartctl -a -d ata /dev/sdd smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST2000DM001-1CH164 Serial Number: Z340CGBC LU WWN Device Id: 5 000c50 064b34a34 Firmware Version: CC27 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue May 18 13:57:19 2021 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled Read SMART Data failed: Input/output error === START OF READ SMART DATA SECTION === Error SMART Status command failed: Input/output error SMART Status command failed: Input/output error SMART overall-health self-assessment test result: UNKNOWN! SMART Status, Attributes and Thresholds cannot be read. SMART Error Log Version: 1 ATA Error Count: 554 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 554 occurred at disk power-on lifetime: 35743 hours (1489 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 03:37:23.810 READ FPDMA QUEUED 61 00 00 ff ff ff 4f 00 03:37:23.809 WRITE FPDMA QUEUED 60 00 f8 48 02 00 40 00 03:37:23.407 READ FPDMA QUEUED 60 00 f8 48 01 00 40 00 03:37:23.406 READ FPDMA QUEUED 60 00 78 c8 00 00 40 00 03:37:23.406 READ FPDMA QUEUED Error 553 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 78 ff ff ff 4f 00 03:15:15.182 READ FPDMA QUEUED 60 00 40 ff ff ff 4f 00 03:15:15.182 READ FPDMA QUEUED 61 00 38 ff ff ff 4f 00 03:15:15.181 WRITE FPDMA QUEUED 61 00 40 ff ff ff 4f 00 03:15:15.094 WRITE FPDMA QUEUED 61 00 30 ff ff ff 4f 00 03:15:15.082 WRITE FPDMA QUEUED Error 552 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 a8 ff ff ff 4f 00 03:15:11.427 WRITE FPDMA QUEUED 60 00 38 ff ff ff 4f 00 03:15:11.422 READ FPDMA QUEUED 60 00 d0 ff ff ff 4f 00 03:15:11.422 READ FPDMA QUEUED 61 00 40 ff ff ff 4f 00 03:15:11.420 WRITE FPDMA QUEUED 61 00 38 ff ff ff 4f 00 03:15:11.418 WRITE FPDMA QUEUED Error 551 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 20 ff ff ff 4f 00 03:15:07.023 WRITE FPDMA QUEUED 60 00 60 ff ff ff 4f 00 03:15:07.005 READ FPDMA QUEUED 60 00 40 ff ff ff 4f 00 03:15:07.005 READ FPDMA QUEUED 60 00 98 ff ff ff 4f 00 03:15:07.004 READ FPDMA QUEUED 61 00 60 ff ff ff 4f 00 03:15:07.003 WRITE FPDMA QUEUED Error 550 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 88 ff ff ff 4f 00 03:15:03.238 READ FPDMA QUEUED 61 00 88 ff ff ff 4f 00 03:15:03.236 WRITE FPDMA QUEUED 60 00 a8 ff ff ff 4f 00 03:15:03.167 READ FPDMA QUEUED 60 00 58 ff ff ff 4f 00 03:15:03.167 READ FPDMA QUEUED 60 00 48 ff ff ff 4f 00 03:15:03.166 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 31226 - # 2 Short offline Completed without error 00% 17784 - # 3 Short offline Completed without error 00% 17643 - # 4 Short offline Completed without error 00% 17502 - # 5 Short offline Completed without error 00% 17361 - # 6 Short offline Completed without error 00% 17256 - # 7 Short offline Completed without error 00% 17115 - # 8 Short offline Completed without error 00% 16975 - # 9 Short offline Completed without error 00% 16892 - #10 Short offline Completed without error 00% 16751 - #11 Short offline Completed without error 00% 16608 - #12 Short offline Completed without error 00% 16467 - #13 Short offline Completed without error 00% 16326 - #14 Short offline Completed without error 00% 16185 - #15 Short offline Completed without error 00% 16050 - #16 Short offline Completed without error 00% 16020 - #17 Short offline Completed without error 00% 15934 - #18 Short offline Completed without error 00% 15801 - #19 Short offline Completed without error 00% 15660 - #20 Short offline Completed without error 00% 15518 - #21 Short offline Completed without error 00% 15377 - Selective Self-tests/Logging not supported Quote Link to comment
JorgeB Posted May 18, 2021 Share Posted May 18, 2021 It's not showing any attributes, run an extended SMART test. Quote Link to comment
mattcrem Posted May 18, 2021 Author Share Posted May 18, 2021 So I ran the below: ~# smartctl -d ata -tlong /dev/sdd smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Read SMART Data failed: Input/output error it just shows the I/O error in less than 1 minute of entering the command Quote Link to comment
JorgeB Posted May 18, 2021 Share Posted May 18, 2021 Yeah, best to replace a disk that doesn't even report SMART data correctly. Quote Link to comment
mattcrem Posted May 18, 2021 Author Share Posted May 18, 2021 The problem is I have two failed drives, I have replaced one of them with a fresh new drive (as I'm out of SATA ports) and unraid tells me that there are too many bad drives and the array won't start at all. is there any way around this? Quote Link to comment
JorgeB Posted May 18, 2021 Share Posted May 18, 2021 You only have on disable drive, the other one is just unmountable, you can replace disk4 and see if disk1 can be used for the rebuild without errors, then replace disk1, if there are read errors on disk1 the rebuilt disk will likely have some corruption resulting in some data loss, but not much you can do about that if disk1 is also failing with single parity. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.