[6.9.2] 2 Failed Drives. What Do I do now?


Recommended Posts

So I woke up today to two failed drives as per the below screenshot. The situation is as follows:

 

Disk 3 - marked "unmountable: not mounted: I have connected it via USB HDD dock to my windows laptop. Both FTK Imager and DiskInternals LinuxReader are able to access the drive and show me the folders and files on it, although not sure if the data is actually readable, but folder structure seems intact. 

 

Disk 4 - Shown as failedBoth FTK and linux reader are showing me the filestructure of this drive. 

 

Since there are two failed drives, I can't mount the array, not even in maintenance mode. So what are my options here to try and retrieve the data and rebuild the array?

 

Disk 3.png

Disk 4.png

failed drives.png

Edited by mattcrem
updated Images
Link to comment
48 minutes ago, Frank1940 said:

First thing which the Gurus (I am not even close to one with this type of problem!)  are going to need is a Diagnostics File.     Tools  >>>  Diagnostics         Post it up in a new Post.  (No one will know that you did it if you add to your first post by editing it!)

you're right, thanks for reminding me, attaching diagnostics: 

 

 

cremonanas-diagnostics-20210515-1749.zip

Link to comment

Here's my suggestion:

Since you can see the contents of the disks on another system, copy a couple items out to spot check their viability.

If good, copy everything/clone the drive(s) to new one(s).  Now you'll at least have a copy.

Once that's done, run extended SMART tests on both to see what their actual condition is.

 

@JorgeB I looked at the diags, and can see where disk1 is poo, but disk4 appears to mount fine... unless i'm reading the log wrong.

Link to comment
On 5/16/2021 at 12:49 PM, sota said:

Here's my suggestion:

Since you can see the contents of the disks on another system, copy a couple items out to spot check their viability.

If good, copy everything/clone the drive(s) to new one(s).  Now you'll at least have a copy.

Once that's done, run extended SMART tests on both to see what their actual condition is.

 

@JorgeB I looked at the diags, and can see where disk1 is poo, but disk4 appears to mount fine... unless i'm reading the log wrong.

Disk 1 happens to have just media files. I connected the disk directly to a windows PC and with FTK imager exported some of them, however neither media player classic nor VLC are able to playback these files.

 

I have now remounted everything in my unraid server and using teracopy to move everything to a local hard disk and waiting to see if this allows me to open the files this way. 

 

Disk 4 was just added to the array but I never actually had anything stored on it if that helps.

 

On 5/16/2021 at 9:35 AM, JorgeB said:

Run a filesystem check on disk1, disk4 appears to be really failing, you can run an extended SMART test to confirm, if it fails replace.

I have not done this yet, however checking dmesg I see a long list of bad sectors in disk1. 

Link to comment
23 hours ago, JorgeB said:

There's no SMART report for disk1 on the diags, see if you can get one manually.

 

smartctl -a -d ata /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1CH164
Serial Number:    Z340CGBC
LU WWN Device Id: 5 000c50 064b34a34
Firmware Version: CC27
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 18 13:57:19 2021 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Read SMART Data failed: Input/output error

=== START OF READ SMART DATA SECTION ===
Error SMART Status command failed: Input/output error
SMART Status command failed: Input/output error
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

SMART Error Log Version: 1
ATA Error Count: 554 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 554 occurred at disk power-on lifetime: 35743 hours (1489 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      03:37:23.810  READ FPDMA QUEUED
  61 00 00 ff ff ff 4f 00      03:37:23.809  WRITE FPDMA QUEUED
  60 00 f8 48 02 00 40 00      03:37:23.407  READ FPDMA QUEUED
  60 00 f8 48 01 00 40 00      03:37:23.406  READ FPDMA QUEUED
  60 00 78 c8 00 00 40 00      03:37:23.406  READ FPDMA QUEUED

Error 553 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 78 ff ff ff 4f 00      03:15:15.182  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:15:15.182  READ FPDMA QUEUED
  61 00 38 ff ff ff 4f 00      03:15:15.181  WRITE FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      03:15:15.094  WRITE FPDMA QUEUED
  61 00 30 ff ff ff 4f 00      03:15:15.082  WRITE FPDMA QUEUED

Error 552 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 a8 ff ff ff 4f 00      03:15:11.427  WRITE FPDMA QUEUED
  60 00 38 ff ff ff 4f 00      03:15:11.422  READ FPDMA QUEUED
  60 00 d0 ff ff ff 4f 00      03:15:11.422  READ FPDMA QUEUED
  61 00 40 ff ff ff 4f 00      03:15:11.420  WRITE FPDMA QUEUED
  61 00 38 ff ff ff 4f 00      03:15:11.418  WRITE FPDMA QUEUED

Error 551 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 20 ff ff ff 4f 00      03:15:07.023  WRITE FPDMA QUEUED
  60 00 60 ff ff ff 4f 00      03:15:07.005  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:15:07.005  READ FPDMA QUEUED
  60 00 98 ff ff ff 4f 00      03:15:07.004  READ FPDMA QUEUED
  61 00 60 ff ff ff 4f 00      03:15:07.003  WRITE FPDMA QUEUED

Error 550 occurred at disk power-on lifetime: 35742 hours (1489 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 88 ff ff ff 4f 00      03:15:03.238  READ FPDMA QUEUED
  61 00 88 ff ff ff 4f 00      03:15:03.236  WRITE FPDMA QUEUED
  60 00 a8 ff ff ff 4f 00      03:15:03.167  READ FPDMA QUEUED
  60 00 58 ff ff ff 4f 00      03:15:03.167  READ FPDMA QUEUED
  60 00 48 ff ff ff 4f 00      03:15:03.166  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     31226         -
# 2  Short offline       Completed without error       00%     17784         -
# 3  Short offline       Completed without error       00%     17643         -
# 4  Short offline       Completed without error       00%     17502         -
# 5  Short offline       Completed without error       00%     17361         -
# 6  Short offline       Completed without error       00%     17256         -
# 7  Short offline       Completed without error       00%     17115         -
# 8  Short offline       Completed without error       00%     16975         -
# 9  Short offline       Completed without error       00%     16892         -
#10  Short offline       Completed without error       00%     16751         -
#11  Short offline       Completed without error       00%     16608         -
#12  Short offline       Completed without error       00%     16467         -
#13  Short offline       Completed without error       00%     16326         -
#14  Short offline       Completed without error       00%     16185         -
#15  Short offline       Completed without error       00%     16050         -
#16  Short offline       Completed without error       00%     16020         -
#17  Short offline       Completed without error       00%     15934         -
#18  Short offline       Completed without error       00%     15801         -
#19  Short offline       Completed without error       00%     15660         -
#20  Short offline       Completed without error       00%     15518         -
#21  Short offline       Completed without error       00%     15377         -

Selective Self-tests/Logging not supported

 

Link to comment

So I ran the below:

 

~# smartctl  -d  ata  -tlong  /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Read SMART Data failed: Input/output error
 

it just shows the I/O error in less than 1 minute of entering the command

Link to comment

You only have on disable drive, the other one is just unmountable, you can replace disk4 and see if disk1 can be used for the rebuild without errors, then replace disk1, if there are read errors on disk1 the rebuilt disk will likely have some corruption resulting in some data loss, but not much you can do about that if disk1 is also failing with single parity.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.