November 27, 201015 yr Currently my HTPC (Windows 7) has a SSD for OS and 4x Samsung F1 1TB drives in Raid 5 on a High-Point Rocket Raid 2310 (PCI-E). Problem: the system won't come back from sleep. I attribute this to RAID drivers as the card beeps upon resume. My UnRaid Pro box has 4 WD Green drives and a couple IDE. Current board: Biostar TA760G M2+. I want to: Try the onboard Raid controller on my HTPC's motherboard, 2x 1TB in Raid 0 as a TV recording array ONLY, see if it will sleep. Move rest of collection to UnRaid. Move the RocketRaid 2310 to my UnRaid box (I've heard mixed results with this) and bring the remaining 2x 1TB drives over. Otherwise, I won't have enough SATA ports. Other option: Promise TX4 I have here - clearly not the same performance. Replace the cache drive with a Raptor 150GB I have lying around into the UnRaid box as a cache drive. This is doable, I just want to approach it carefully as ALL my data will have to be moved to the UnRaid box before I start. Questions: I'm currently running 4.5.4. Should I bother updating? How can I best test if the RocketRaid will work correctly for me in UnRaid before I entrust it with all my data? I guess it doesn't matter if I replace the cache drive first or begin adding the 1TBs. Should I pre-clear them? They've been in service for 3 or so years now... I would sort of like (for organization sake) to move disks to different ports, ie keep a share like "software" as disk one as it likely won't need expansion while "movies" could be disk 4 & 5 & later 6, 7 etc. in sequence. What's the safest way to do this? Is it worth the risk of messing something up? Advice appreciated.
November 27, 201015 yr First my impressions, you are jumping through lots of hoops to avoid upgrading some drives. I would add a 2T drive or two to make my life simpler. I would definitely use the TX4, you can place your cache drive and a data drive on it with no loss of performance. I think 4.5.4 should be fine, unless there were driver changes for the 2310. To test the 2310 I would fill all four ports with current data drives and do a parity verify without correction (and don't write to your array while doing so). I would then move your 1T Samsungs to the 2310 and do a preclear on all of them at the same time. After that, write to those disk shares using Teracopy until you trust them. To reorganize data, move the drives to the ports you want and then do a "trust my parity" (see wiki). I've done this lots of times. Just make sure you use the correct drive for parity. Immediately after do a parity check. I would do this before any of your other operations. Good luck
November 27, 201015 yr Author Thanks a lot ohlwiler! Looks like I have some days of work cut out for me. I do agree on the drive front, it's just I only want the two 1TB drives in my HTPC moving forward and short of selling them... may as well put them somewhere.
November 27, 201015 yr I know how it is, I have five 1T. drives sitting on the shelf unused right now. Instead of the amount that I paid, I like to think of them as how much they are worth - it doesn't seem so wasteful. They also gave many hours of good service before they were benched.
November 28, 201015 yr I'm starting out new and would like to buy a few 2T drives, what do you suggest for high quality? Thanks, Tom
November 28, 201015 yr Buy what is cheapest. I have a Hitachi, 3 Samsung F3s, 4 WD EADs, 3 WD EARs and 7 Seagate LPs that I've acquired over the last year. I've always just bought what was on sale. I've had one Seagate that I removed because of escalating reallocated sectors and one EADs that failed smart while I was preclearing. Between all of my drives I have one reallocated sector. Don't buy all of your drives in one batch, exercise them well before deploying and watch them close for reallocated sectors.
November 28, 201015 yr Author Sounds like good advice. I know the preclear process shows reallocated sectors but how can you monitor that ongoing?
November 28, 201015 yr The easiest way is to install unMENU and then access myMain then Smart tab (or sm or sh for each drive). http://lime-technology.com/forum/index.php?topic=5568.0
December 5, 201015 yr Author Thanks for the suggestion. I really should - just been putting it off as things were working. Now they are not. My re-org mission got derailed slightly. Here is what I did: 1) Installed a 2TB Seagate LP drive, initiated preclear. 2) few days later (lost telnet connection) but found no preclear signature on that drive. Ok, 3) Reboot system. Now it's in an endless loop, reporting ata4 errors. Something like Error: error: read dma ext. It just loops through over and over. Hrm, while typing this it looks like it finally gave up looping and I might be able to log in and have a look. Farthest I've gotten all day. Edit: syslog attached. I don't know which drive it's referencing with "ata4" I only thought they were addressed with disk x or sdx/hdx. After doing SMART test on each drive - while they all passed - I think I found the culprit: my cache drive. SMART results: === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family Device Model: ST3120026A Serial Number: 3LJ18F2M Firmware Version: 3.54 User Capacity: 120,034,123,776 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is: Sat Dec 4 21:47:20 2010 GMT+5 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 85) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 065 060 006 Pre-fail Always - 43431990 3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1009 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 692233027 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 13952 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1318 194 Temperature_Celsius 0x0022 026 051 000 Old_age Always - 26 195 Hardware_ECC_Recovered 0x001a 065 059 000 Old_age Always - 43431990 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 36 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 36 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 00 00 00 50 Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.860 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.195 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:41.530 READ DMA EXT Error 35 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 00 00 00 50 Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.860 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.195 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:41.530 READ DMA EXT Error 34 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 00 00 00 50 Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.860 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.195 READ DMA EXT ef 03 40 00 00 00 10 00 00:00:41.530 SET FEATURES [set transfer mode] Error 33 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 00 00 00 50 Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:42.860 READ DMA EXT ef 03 40 00 00 00 10 00 00:00:42.195 SET FEATURES [set transfer mode] 25 03 01 00 00 00 50 00 00:00:41.530 READ DMA EXT Error 32 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 00 00 00 50 Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:43.526 READ DMA EXT ef 03 40 00 00 00 10 00 00:00:42.860 SET FEATURES [set transfer mode] 25 03 01 00 00 00 50 00 00:00:42.195 READ DMA EXT 25 03 01 00 00 00 50 00 00:00:41.530 READ DMA EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. syslog-12-4-10.txt
December 5, 201015 yr Buy what is cheapest. I have a Hitachi, 3 Samsung F3s, 4 WD EADs, 3 WD EARs and 7 Seagate LPs that I've acquired over the last year. I've always just bought what was on sale. I've had one Seagate that I removed because of escalating reallocated sectors and one EADs that failed smart while I was preclearing. Between all of my drives I have one reallocated sector. Don't buy all of your drives in one batch, exercise them well before deploying and watch them close for reallocated sectors. Ok, ended up buying (5) 2TB EARS, and (1) 1TB Caviar Black drive for cache. The drives have build dates between April and November.
December 5, 201015 yr Author Ok, so I've concluded the only drive sporting SMART issues and errors was my cache drive. Unfortunately, removing did not rid the system of all the ata4 errors while booting unRaid. How can I identify which drive it's referring to by ata4? Edit: I found this command; interesting results? Linux 2.6.32.9-unRAID. root@Slipstream:~# dmesg|grep SATA|grep link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4: limiting SATA link speed to 1.5 Gbps ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
December 5, 201015 yr Ok, so I've concluded the only drive sporting SMART issues and errors was my cache drive. Unfortunately, removing did not rid the system of all the ata4 errors while booting unRaid. How can I identify which drive it's referring to by ata4? Dec 4 21:04:11 Slipstream kernel: ata4.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133 Dec 4 21:04:11 Slipstream kernel: ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), A
December 5, 201015 yr Author Ouch. So my disk3, that makes sense... ata4. That just so happens to be my drive with the most data on it - only 400MB left! (Not sure how that happened, thought it was configured to fill up to 10GB free). So what do you recommend Joe? I look at this and it "looks" bad to me. I have a brand new Seagate 2TB LP drive that is finishing up step 2 of the preclear script and I've previously run it through a bunch of benchmarks / data copy tests so it looks ready to go. I last did a parity check 11/30 with 1 error. I guess once the other drive is done pre-clearing, I'll pull it and trust parity to rebuild on the Seagate then RMA the Green.
December 6, 201015 yr Author The preclear on my brand new Seagate 2TB LP drive is done after 30 hours and I don't like the results I see: S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 163262430 --- > 1 Raw_Read_Error_Rate 0x000f 118 100 006 Pre-fail Always - 186361218 58c58 < 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 339601 --- > 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 499541 66,67c66,67 < 190 Airflow_Temperature_Cel 0x0022 078 066 045 Old_age Always - 22 (Lifetime Min/Max 18/22) < 195 Hardware_ECC_Recovered 0x001a 049 047 000 Old_age Always --- > 190 Airflow_Temperature_Cel 0x0022 071 066 045 Old_age Always - 29 (Lifetime Min/Max 18/31) > 195 Hardware_ECC_Recovered 0x001a 049 046 000 Old_age Always 71,73c71,73 < 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 211922276319277 < 241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 299286567 < 242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 464660976 --- > 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 239186728714315 > 241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4180304637 > 242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 3763462133 ============================================================================ Some of those numbers seem crazy high!! What should I do with it? And any suggestions on what I should do with the WD Green drive that's causing all the errors?Might WD Diagnostics help or are the errors enough ground to RMA the drive?
December 7, 201015 yr Most of the numbers in the "raw" column mean anything to anybody except the manufacturer. You should look ONLY at the Current value vs. the Threshold value. Your disk is perfectly healthy, in fact, the normalized "read error rate" improved over the course of the pre-clear. Joe L.
December 7, 201015 yr Author Hi Joe, Thanks for the reassurance. The numbers seemed out of whack. I thought I had read the opposite at one point though. Clearly not. I've successfully replaced the failing EARS with the LP drive and data is (slowly) rebuilding now. Going to run some tests on the EARS and see if I can't get an RMA then get back to my data re-org!
December 7, 201015 yr You might have suspected the numbers were not meaningful when you saw the head-flying-hours counter = 239186728714315 If that was hours, that would be equal to 27,304,421,086 years. If it were millionths of a second it would be 239186728 seconds 239186728 seconds = 3986445.466 minutes 3986445.466 minutes = 66440.75 hours 66440.75 hours = 2768.36 days 2768.36 days = 7.58 years. I suspect neither 27,304,421,086 years or 7.58 years is correct. Joe L.
December 8, 201015 yr Author I did see that and assumed it wasn't being read right - hence, the drive must be erroneous. So to the EARS that crapped out on me... I ran WD's Lifeguard software on it. Quick test - Passed Extended test - Passed Then I tried to write zeros to the drive and it looks like it completed but finally with an error saying that it "failed up update disk property!" Here's to hoping that's enough to stand on for an RMA. The smart readouts from the other day were fine for this drive as well. Edit - it appears it ran a parity check after the rebuild was done and found one error again. I've started one more parity check and a few minutes in it's already found 2 errors. I'm not liking this trend... Will post syslog when done.
December 8, 201015 yr Author So here's the syslog which should include the last parity check which still found the handful of errors after the EARS was replaced. My next step is going to be to remove that aged IDE drive I had suspected from the get go then run the parity check again. Edit: I have now removed the aging WD80GB drive and the writing of the parity went flawlessly. Now I'm running yet another parity check to hopefully verify all is well. Next up: RMA the Green drive and add my Raptor as the cache drive. syslog-2010-12-8.txt
Archived
This topic is now archived and is closed to further replies.