UR fails to mount drives

January 27, 201412 yr

So, I restarted unRAID since it had been months and was becoming a tad slower than normal. Upon restart, the system appears to hang at he stage of mounting drives. My syslog is attached. The date and time are evidently off as the system thinks today is the 16th. I attempted to upgrade from 5.0-rc16 to 5.0.5 before this post in hopes of resolving the issue, without success.

All help is appreciated.

Matt

---EDIT----

After 30 minutes, the system managed to mount disks. but the overall responsiveness is horrible. None of my remote computers can find shares on unRAID.

---EDIT #2---

Now that the system is up and stable, disks are mounted. I have a partiy check in progress that has an estimated end time of 76days, 4 hours and 58 minutes... something is still not right here!

syslog.zip

Quote

January 27, 201412 yr

Your log shows:

/dev/sdb: smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Jan 16 20:25:38 Server status[12913]: Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Jan 16 20:25:38 Server status[12913]: SMART overall-health self-assessment test result: FAILED!

Jan 16 20:25:38 Server status[12913]: Drive failure expected in less than 24 hours. SAVE ALL DATA.

Jan 16 20:25:38 Server status[12913]: Failed Attributes:

Jan 16 20:25:38 Server status[12913]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

Jan 16 20:25:38 Server status[12913]: 5 Reallocated_Sector_Ct 0x0033 133 133 140 Pre-fail Always FAILING_NOW 1265

your device 'sdb' is: WDC_WD20EARS-00MVWB0_WD-WCAZA5820109 (sdb) 1953514584

Quote

January 27, 201412 yr

Author

So, this is one of those AHAA moments I hear about but til now have never had.

I am assuming that the impending drive failure is what is preventing the server from functioning. So, by that, replacingthe drive should fix the issue??

Quote

January 27, 201412 yr

It would account for the slow speed. Replacing the drive (or RMA) would make sense.

Let the parity check run until someone more expert can point out the risks of stopping while there's a disk failing.

Hopefully, someone more expert has advice on this situation.

Quote

January 27, 201412 yr

Stop the parity check and replace the drive.

Quote

January 28, 201412 yr

Author

Since I had a valid parity check from this past Sunday, I did stop the current parity check. I have attached a link to the syslog on dropbox. Starting on line #2691 I am seeing alot of:

Jan 27 07:32:12 Server kernel: md: disk1 read error, sector=3215123328

replace the sector number per entry... there are a lot of entries...

I'm guessing this means the drive has officially failed? I ordered new HDDs today, so much of this likely academic, but I would like to better understand what unRAID is trying to tell me.

Would stopping the array and un-assigning disk1 in any way make the server functional until the new drives arrive and I can restore the drive from parity? Or would using the parity drive make it as slow to respond as what I currently have?

Thanks for the help.

Matt (TRP)

https://www.dropbox.com/s/jy9weszz4llrzss/syslog%2001272014.docx

Quote

January 28, 201412 yr

Paste a SMART report.

Quote

January 28, 201412 yr

Author

smartctl -a -d ata /dev/sdb yields:

root@Server:~# smartctl -a -d ata /dev/sdb
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA5820109
LU WWN Device Id: 5 0014ee 25b02a896
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Jan 27 21:39:01 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
                                        was aborted by an interrupting command f
rom host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                (37080) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp
ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 358) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   194   194   051    Pre-fail  Always       -
       92656
  3 Spin_Up_Time            0x0027   186   164   021    Pre-fail  Always       -
       5666
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -
       3646
  5 Reallocated_Sector_Ct   0x0033   133   133   140    Pre-fail  Always   FAILI
NG_NOW 1265
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -
       0
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -
       21057
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -
       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -
       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -
       182
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -
       132
193 Load_Cycle_Count        0x0032   186   186   000    Old_age   Always       -
       43567
194 Temperature_Celsius     0x0022   125   116   000    Old_age   Always       -
       25
196 Reallocated_Event_Count 0x0032   123   123   000    Old_age   Always       -
       77
197 Current_Pending_Sector  0x0032   003   001   000    Old_age   Always       -
       64245
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -
       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -
       2
200 Multi_Zone_Error_Rate   0x0008   162   162   000    Old_age   Offline      -
       10190

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA
_of_first_error
# 1  Short offline       Completed: unknown failure    90%     21057         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@Server:~#

For what it's worth, unRAID's main page shows disk1 (sdb) currently has 76039 errors... since coming online 1day 2hours 33minutes ago.

Quote

January 28, 201412 yr

The drive has FAILED!.

Quote

January 28, 201412 yr

Author

Thanks for the ALL CAPS emphasis!

Am I better off

1) powering down the server until I replace the HDD

2) stopiing the server and unassigning the drive until the new HDD arrive and I can replace them

3) some other combination of things

Quote

January 28, 201412 yr

Option 1 is safest.

Quote

UR fails to mount drives

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)