Hard Drive Crashed? Won't Mount on Boot

January 26, 201313 yr

Earlier in the day there were no problems. Then I went to load a file from the network and it wasn't working. I tried to go to my unraid server's homepage but that wouldn't load. So, I rebooted my server. When booting back up, it gave me lots of errors on my 4th hard drive, my 2TB Western Digital WD20EARX.

Here's a snippit of the errors. They repeat for a long time:

Jan 26 10:45:29 Zeus kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 (Errors)
Jan 26 10:45:29 Zeus kernel: ata4.00: irq_stat 0x40000008 (Drive related)
Jan 26 10:45:29 Zeus kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues)
Jan 26 10:45:29 Zeus kernel: ata4.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in (Drive related)
Jan 26 10:45:29 Zeus kernel:          res 41/40:00:00:00:00/00:00:00:00:00/40 Emask 0x409 (media error) <F> (Errors)
Jan 26 10:45:29 Zeus kernel: ata4.00: status: { DRDY ERR } (Drive related)
Jan 26 10:45:29 Zeus kernel: ata4.00: error: { UNC } (Errors)
Jan 26 10:45:29 Zeus kernel: ata4.00: configured for UDMA/133 (Drive related)
Jan 26 10:45:29 Zeus kernel: ata4: EH complete (Drive related)
Jan 26 10:45:32 Zeus kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 (Errors)

It seems it can't even mount this hard drive. The Unraid/Main shows "disk 3 missing" with a red dot. I tried running a smartctl but it said:

Smartctl open device: /dev/sdd failed: No such device

I attached my syslog. Please help! :'(

Specs:

Unraid 5.0rc10

4x 2TB HD's in array (1 parity)

Biostar A880g+ Mobo

syslog-2013-01-26.txt.zip

Quote

January 26, 201313 yr

Post a SMART report for disk3.

Quote

January 26, 201313 yr

Author

Post a SMART report for disk3.

The "Disk 3" that the webgui is referencing is actually my disk 4 (SDD) because it doesn't count the parity drive (SDA). When I run the SMART report on SDD (the affected disk), it gives me an error saying "no such device". I think this is because unraid wasn't able to mount the disk. (See errors above)

This isn't the affected drive, but I went ahead and ran a SMART report on SDC for you anyways:

~# smartctl -a -d ata /dev/sdc
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS723020BLA642
Serial Number:    MN5220F33B47SK
Firmware Version: MN6OA800
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Jan 26 14:34:41 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
				was never started.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (19523) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Error Recovery Control supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       92
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       368 (Average 395)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       386
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   135   135   020    Pre-fail  Offline      -       26
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       5298
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       390
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       390
194 Temperature_Celsius     0x0002   250   250   000    Old_age   Always       -       24 (Min/Max 14/41)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

January 27, 201313 yr

A SMART report is required for disk3. This is the missing disk. Power cycle the server. If disk3 does not respond it must be replaced. The posted report is for disk2.

Quote

January 27, 201313 yr

When I run the SMART report on SDD (the affected disk), it gives me an error saying "no such device". I think this is because unraid wasn't able to mount the disk.

A disk need not be mounted to run a smart report on it. It does need to show up in the BIOS and as one of the drives found by unRAID when it boots.

If the drive does not show up in the BIOS, or to unRAID, then either it has died, or the disk controller port it is connected to has died, or one of the cables (power or data) to the drive is either defective or disconnected.

I would power down and then re-seat the cables to the disk not showing. On yes, linux is case sensitive. referencing the drive as /dev/SDD is wrong, as that does not exist. /dev/sdd would be correct (assuming the actual device name is /dev/sdd) You can type:

ls -l /dev/disk/by-id

to see the disks by model/serial number and their affiliated three character device name. Ignore those with a -part1 name... those are partitions, not drives.

If the drive still does not show up after re-seating the cables, try different cables, or a different disk controller port before deciding the disk itself is defective.

Joe L.

Quote

January 27, 201313 yr

Author

Hi Joe - Thanks for the response.

The disk is showing up in the bios. Unraid can see that there is a disk there as it throws a lot of errors about that disk (see sys log above).

I actually did open up the server and swapped the cable. I'll try a different port.

linux is case sensitive. referencing the drive as /dev/SDD is wrong, as that does not exist. /dev/sdd would be correct (assuming the actual device name is /dev/sdd) You can type:

ls -l /dev/disk/by-id

When I was running the SMART command, I was using lower case. It results in an error: Smartctl open device: /dev/sdd failed: No such device

Here's the results of "ls -l /dev/disk/by-id".

~# ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Jan 26 10:44 ata-Hitachi_HDS723020BLA642_MN5220F33B47SK -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 26 10:44 ata-Hitachi_HDS723020BLA642_MN5220F33B47SK-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 ata-ST2000DL003-9VT166_6YD1Z5BA -> ../../sda
lrwxrwxrwx 1 root root 10 Jan 26 10:44 ata-ST2000DL003-9VT166_6YD1Z5BA-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 ata-ST2000DL003-9VT166_6YD248PZ -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan 26 10:44 ata-ST2000DL003-9VT166_6YD248PZ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 scsi-SATA_Hitachi_HDS7230_MN5220F33B47SK -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 26 10:44 scsi-SATA_Hitachi_HDS7230_MN5220F33B47SK-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 scsi-SATA_ST2000DL003-9VT_6YD1Z5BA -> ../../sda
lrwxrwxrwx 1 root root 10 Jan 26 10:44 scsi-SATA_ST2000DL003-9VT_6YD1Z5BA-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 scsi-SATA_ST2000DL003-9VT_6YD248PZ -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan 26 10:44 scsi-SATA_ST2000DL003-9VT_6YD248PZ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 usb-SanDisk_Cruzer_200524441304F581D5BB-0:0 -> ../../sde
lrwxrwxrwx 1 root root 10 Jan 26 10:44 usb-SanDisk_Cruzer_200524441304F581D5BB-0:0-part1 -> ../../sde1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 wwn-0x5000c50046fd0f8d -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan 26 10:44 wwn-0x5000c50046fd0f8d-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 wwn-0x5000c50046fd9c93 -> ../../sda
lrwxrwxrwx 1 root root 10 Jan 26 10:44 wwn-0x5000c50046fd9c93-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Jan 26 10:44 wwn-0x5000cca369ef568c -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 26 10:44 wwn-0x5000cca369ef568c-part1 -> ../../sdc1

sdd doesn't show up there. Which is weird because unraid references "ata4" in the errors and my bios can see the HD.

Quote

January 27, 201313 yr

If it does not show up in the ls command then it has dropped off-line for some reason (which will also be why smartctl is failing). If you attach a syslog then it might give an idea of what error is causing this to happen.

Quote

January 27, 201313 yr

Author

If it does not show up in the ls command then it has dropped off-line for some reason (which will also be why smartctl is failing). If you attach a syslog then it might give an idea of what error is causing this to happen.

There is a sys log attached to my first post. Let me know if that helps!

Quote

January 27, 201313 yr

The syslog shows that whatever is attached to ata4 (I presume sdd) is continually failing, and attempts to reset are not working. You should carefully check cabling and power supplies.

If they all appear OK then the drive itself could be failing. I would suggest that in that case you should be looking to attach the drive to a windows system so that you can run the WDC Data Lifeguard tools against it to check it out.

Quote

January 28, 201313 yr

Author

I checked the cabling again. I swapped the ports and the SATA cable and everything, still no luck.

I would suggest that in that case you should be looking to attach the drive to a windows system so that you can run the WDC Data Lifeguard tools against it to check it out.

I took your advice and hooked it up to a windows machine (via a USB to serial hard drive adapter) and windows couldn't recognize it as a drive. I ran the WDC Data Lifeguard tools but it couldn't find the hard drive either!

I went ahead and submitted an RMA claim to Western Digital. I bought this hard drive a few months ago so it's still under warranty and it seems like an easy process to return it. Thanks everyone for the help.

Quote

February 2, 201313 yr

Author

Good news! I submitted an RMA to Western Digital and they sent me a 3TB replacement for my 2TB drive!

I've installed it in my server and I'm not exactly sure of the best way to get my server back up and running without screwing anything up.

Current Array Setup:

2TB Parity

2TB Data Drive #1

2TB Data Drive #2

2TB Data Drive #3 (FAILED DRIVE, missing)

Outside Array:

3TB Drive

WD sent me a 3TB (WD30EZRX) drive to replace my data drive #3... So, how can I make this 3TB drive my parity drive and then make my 2TB parity drive my data drive #3??

Thanks!

Quote

February 2, 201313 yr

Good news! I submitted an RMA to Western Digital and they sent me a 3TB replacement for my 2TB drive!

I've installed it in my server and I'm not exactly sure of the best way to get my server back up and running without screwing anything up.

Current Array Setup:

2TB Parity

2TB Data Drive #1

2TB Data Drive #2

2TB Data Drive #3 (FAILED DRIVE, missing)

Outside Array:

3TB Drive

WD sent me a 3TB (WD30EZRX) drive to replace my data drive #3... So, how can I make this 3TB drive my parity drive and then make my 2TB parity drive my data drive #3??

Thanks!

Easy... it is a procedure called "swap-disabled" However, it is slightly broken in rc10. You'll need to upgrade to rc11 or rc11a (once released officially)

Changes from 5.0-rc10 to 5.0-rc11
---------------------------------

- emhttp: fixed spurious "title not found" log entries

- emhttp: ensure new parity disk for 'swap disable' has a valid partition table

You assign the new 3TB drive as parity, and assign the current 2TB parity drive as drive 3. (the failed drive)

When you then start the array parity will be copied from the old parity drive to the new, and then the old parity drive re-constructed as the missing data drive.

Your array will be off-line while the parity is copied... probably 5 to 8 hours, depending on the copy speed. Then, it will come back online as it reconstructs the defective disk...

Joe L.

Quote

February 3, 201313 yr

Author

Ok, great. Thanks Joe. I'll go ahead and upgrade to rc11 to do the "swap-disabled".

One last thing, I'm preclearing the new 3TB drive right now... is that necessary for a replacement drive? Once should be good? It takes a longgg time with a 3TB.

Quote

February 3, 201313 yr

Ok, great. Thanks Joe. I'll go ahead and upgrade to rc11 to do the "swap-disabled".

One last thing, I'm preclearing the new 3TB drive right now... is that necessary for a replacement drive?

Not mandatory for a replacement, but to me it is important as you would want the disk to die an early death before you install it rather than soon after. (most disks fail early in their lives, or after many years of service... In between they are very reliable. You are basically burning in the drive to detect any early problems.)

Once should be good? It takes a longgg time with a 3TB.

yes, it does.

Quote

February 3, 201313 yr

Author

Ok, thanks for the help, Joe!

Quote

Hard Drive Crashed? Won't Mount on Boot

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)