TapRackPull Posted January 27, 2014 Share Posted January 27, 2014 So, I restarted unRAID since it had been months and was becoming a tad slower than normal. Upon restart, the system appears to hang at he stage of mounting drives. My syslog is attached. The date and time are evidently off as the system thinks today is the 16th. I attempted to upgrade from 5.0-rc16 to 5.0.5 before this post in hopes of resolving the issue, without success. All help is appreciated. Matt ---EDIT---- After 30 minutes, the system managed to mount disks. but the overall responsiveness is horrible. None of my remote computers can find shares on unRAID. ---EDIT #2--- Now that the system is up and stable, disks are mounted. I have a partiy check in progress that has an estimated end time of 76days, 4 hours and 58 minutes... something is still not right here! syslog.zip Link to comment
DaleWilliams Posted January 27, 2014 Share Posted January 27, 2014 Your log shows: /dev/sdb: smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Jan 16 20:25:38 Server status[12913]: Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Jan 16 20:25:38 Server status[12913]: SMART overall-health self-assessment test result: FAILED! Jan 16 20:25:38 Server status[12913]: Drive failure expected in less than 24 hours. SAVE ALL DATA. Jan 16 20:25:38 Server status[12913]: Failed Attributes: Jan 16 20:25:38 Server status[12913]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE Jan 16 20:25:38 Server status[12913]: 5 Reallocated_Sector_Ct 0x0033 133 133 140 Pre-fail Always FAILING_NOW 1265 your device 'sdb' is: WDC_WD20EARS-00MVWB0_WD-WCAZA5820109 (sdb) 1953514584 Link to comment
TapRackPull Posted January 27, 2014 Author Share Posted January 27, 2014 So, this is one of those AHAA moments I hear about but til now have never had. I am assuming that the impending drive failure is what is preventing the server from functioning. So, by that, replacingthe drive should fix the issue?? Link to comment
DaleWilliams Posted January 27, 2014 Share Posted January 27, 2014 It would account for the slow speed. Replacing the drive (or RMA) would make sense. Let the parity check run until someone more expert can point out the risks of stopping while there's a disk failing. Hopefully, someone more expert has advice on this situation. Link to comment
dgaschk Posted January 27, 2014 Share Posted January 27, 2014 Stop the parity check and replace the drive. Link to comment
TapRackPull Posted January 28, 2014 Author Share Posted January 28, 2014 Since I had a valid parity check from this past Sunday, I did stop the current parity check. I have attached a link to the syslog on dropbox. Starting on line #2691 I am seeing alot of: Jan 27 07:32:12 Server kernel: md: disk1 read error, sector=3215123328 replace the sector number per entry... there are a lot of entries... I'm guessing this means the drive has officially failed? I ordered new HDDs today, so much of this likely academic, but I would like to better understand what unRAID is trying to tell me. Would stopping the array and un-assigning disk1 in any way make the server functional until the new drives arrive and I can restore the drive from parity? Or would using the parity drive make it as slow to respond as what I currently have? Thanks for the help. Matt (TRP) https://www.dropbox.com/s/jy9weszz4llrzss/syslog%2001272014.docx Link to comment
TapRackPull Posted January 28, 2014 Author Share Posted January 28, 2014 smartctl -a -d ata /dev/sdb yields: root@Server:~# smartctl -a -d ata /dev/sdb smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF) Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA5820109 LU WWN Device Id: 5 0014ee 25b02a896 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Mon Jan 27 21:39:01 2014 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command f rom host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 73) The previous self-test completed having a test element that failed and the test element that failed is not known. Total time to complete Offline data collection: (37080) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off supp ort. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 358) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 194 194 051 Pre-fail Always - 92656 3 Spin_Up_Time 0x0027 186 164 021 Pre-fail Always - 5666 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3646 5 Reallocated_Sector_Ct 0x0033 133 133 140 Pre-fail Always FAILI NG_NOW 1265 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 21057 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 182 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 132 193 Load_Cycle_Count 0x0032 186 186 000 Old_age Always - 43567 194 Temperature_Celsius 0x0022 125 116 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 123 123 000 Old_age Always - 77 197 Current_Pending_Sector 0x0032 003 001 000 Old_age Always - 64245 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 2 200 Multi_Zone_Error_Rate 0x0008 162 162 000 Old_age Offline - 10190 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed: unknown failure 90% 21057 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Server:~# For what it's worth, unRAID's main page shows disk1 (sdb) currently has 76039 errors... since coming online 1day 2hours 33minutes ago. Link to comment
TapRackPull Posted January 28, 2014 Author Share Posted January 28, 2014 Thanks for the ALL CAPS emphasis! Am I better off 1) powering down the server until I replace the HDD 2) stopiing the server and unassigning the drive until the new HDD arrive and I can replace them 3) some other combination of things Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.