odd1 Posted January 5, 2012 Posted January 5, 2012 Hi all, I've had the array up for close to a year now (I think). The array consists of 9 disks, including parity, of various sizes. All are SATA. No Cache drive (now). I run YAMJ and primarily use the array to feed my 7 popcorn hours around the house. Recently, I've been having some problems with speeds on the array. Mounting, Unmounting, writing to, etc... I say recently because this all started happening after I incorporated my cache drive into the array because I had run out of space on the other drives and never really used the cache drive. Did I miss something in that process? As I write this, I am copying a 5.5GB file from my main PC to my tower/movies dir and am getting speeds of between 250KB/s and 1.8MB/s! It has gotten to the point where I can't even download anything directly to the array because it can't write fast enough and it keeps giving me errors in my torrent client. I tried to stop the array the other day to reboot and it took over an hour to unmount all of the drives! It now takes me over 1.5 hours to do a YAMJ scan! What the heck is going on?! I set this up and have not had any problems until now. I have not had to tinker with anything so I am not as savvy as some of you all are with the inner workings of Linux and UnRaid. In order to help me at all, I know you will need more info, just let me know what you need & I will get it. I really need to figure out what is happening. Please help!!! My family has gotten used to a certain level of convenience with the media delivery and now I am not able to keep up! ARGGGGGHHHHH.... thanks all....
dave_m Posted January 5, 2012 Posted January 5, 2012 Are all your drives almost full? There have been reports of extremely slow access when disks don't have much free space left.
odd1 Posted January 5, 2012 Author Posted January 5, 2012 All drives but my newly converted cache drive are reading 99-100% full. Could this be the problem? If so, is it just a matter of moving files from the full drives to the empty one or is there something more I need to do? Right now the same file is still copying and none of the other drives are even spinning, only the one semi-empty one... Oh! it just finished the copy. 5.45GB took 1:21:27! Thanks,
odd1 Posted January 10, 2012 Author Posted January 10, 2012 OK I cleaned up some stuff and moved some things around and no drive is near 100% any more. The problem is still there though. I've reset all routers and switches and rebooted both machines many times. It's not a network problem. I tested moving a 421MB file from the tower to my PC and it took 10 sec. The same file took 6 minutes to be moved back to the tower. It seems to only slow down during writes not reads. Anyone have any ideas? Thanks...
odd1 Posted January 10, 2012 Author Posted January 10, 2012 Here is my syslog from just now after the transfer to and from... syslog-2012-01-10.txt
dave_m Posted January 10, 2012 Posted January 10, 2012 Disk sdc was reporting errors at 14:15, might be worth looking at the SMART status for it.
odd1 Posted January 11, 2012 Author Posted January 11, 2012 sdc is my parity drive. I did both a short and long SMART self test today and these are the results as reported in the status report: SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 5944 158494197 # 2 Short offline Completed: read failure 10% 5944 245286409 Could someone help me interpret this? The complete report: smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green family Device Model: WDC WD20EADS-00W4B0 Serial Number: WD-WCAVY6087956 Firmware Version: 01.00A01 User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jan 11 15:54:44 2012 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (41580) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 237 234 021 Pre-fail Always - 10125 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 434 5 Reallocated_Sector_Ct 0x0033 177 177 140 Pre-fail Always - 181 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 5952 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 136 136 000 Old_age Always - 193856 194 Temperature_Celsius 0x0022 120 116 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 080 080 000 Old_age Always - 120 197 Current_Pending_Sector 0x0032 197 197 000 Old_age Always - 1001 198 Offline_Uncorrectable 0x0030 198 198 000 Old_age Offline - 753 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 184 000 Old_age Offline - 22 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 5944 158494197 # 2 Short offline Completed: read failure 10% 5944 245286409 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Joe L. Posted January 11, 2012 Posted January 11, 2012 Your drive is dying. Both your LONG and SHORT tests aborted on read errors when they encountered un-readable sectors. It has 1001 unreadable sectors pending re-allocation when next written, and 181 unreadable sectors it has already re-allocated. 5 Reallocated_Sector_Ct 0x0033 177 177 140 Pre-fail Always - 181 197 Current_Pending_Sector 0x0032 197 197 000 Old_age Always - 1001 Can you say RMA? I would replace the drive with a good one as soon as possible.
odd1 Posted January 11, 2012 Author Posted January 11, 2012 That's what I was afraid of. This is a brand new drive! Less than 3 months old! Does the spin-up/spin-down process shorten the lives of these drives? I've had drives in my main PC (pre Unraid) that stayed up all the time and lasted for 5 years or more. Since I moved to unraid, I've lost 3 drives!
WeeboTech Posted January 11, 2012 Posted January 11, 2012 It's been stated that spin up spin down can shorten the life of drives, but if that were true, manufacturers would suggest leaving them spin. Instead there is firmware to spin down idle drives. What affects drives allot is any kind of power fluctuations. It can ruin a few sectors. You could try running badblocks in destructive write mode to force reallocation of the sectors. But I'm not sure it's worth it if the smart test says failing now. I did not see any FAILING_NOW status, so you could try. It may help refresh the format on the drive. FYI, you would need to take it out of the array before doing the badblocks. It will take about 2-3 days to do a 4 pass badblocks test. After that you can check the SMART output and decide if you want to RMA the drive.
odd1 Posted January 16, 2012 Author Posted January 16, 2012 Fixed it. Did a parity check. Took two days to complete, corrected one sync error and now everything is running normally. Transfers to the array are now running 25-30MB/s. I'll keep an eye on the drive. Thanks to all who helped me track this down.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.