Huge storage re-org - guidance needed. - unRAID Server 4.5 [No new topics]

November 27, 201015 yr

Currently my HTPC (Windows 7) has a SSD for OS and 4x Samsung F1 1TB drives in Raid 5 on a High-Point Rocket Raid 2310 (PCI-E). Problem: the system won't come back from sleep. I attribute this to RAID drivers as the card beeps upon resume.

My UnRaid Pro box has 4 WD Green drives and a couple IDE. Current board: Biostar TA760G M2+.

I want to:

Try the onboard Raid controller on my HTPC's motherboard, 2x 1TB in Raid 0 as a TV recording array ONLY, see if it will sleep. Move rest of collection to UnRaid.
Move the RocketRaid 2310 to my UnRaid box (I've heard mixed results with this) and bring the remaining 2x 1TB drives over. Otherwise, I won't have enough SATA ports. Other option: Promise TX4 I have here - clearly not the same performance.
Replace the cache drive with a Raptor 150GB I have lying around into the UnRaid box as a cache drive.

This is doable, I just want to approach it carefully as ALL my data will have to be moved to the UnRaid box before I start.

Questions:

I'm currently running 4.5.4. Should I bother updating?
How can I best test if the RocketRaid will work correctly for me in UnRaid before I entrust it with all my data?
I guess it doesn't matter if I replace the cache drive first or begin adding the 1TBs. Should I pre-clear them? They've been in service for 3 or so years now...
I would sort of like (for organization sake) to move disks to different ports, ie keep a share like "software" as disk one as it likely won't need expansion while "movies" could be disk 4 & 5 & later 6, 7 etc. in sequence. What's the safest way to do this? Is it worth the risk of messing something up?

Advice appreciated.

November 27, 201015 yr

First my impressions, you are jumping through lots of hoops to avoid upgrading some drives. I would add a 2T drive or two to make my life simpler.

I would definitely use the TX4, you can place your cache drive and a data drive on it with no loss of performance.

I think 4.5.4 should be fine, unless there were driver changes for the 2310.

To test the 2310 I would fill all four ports with current data drives and do a parity verify without correction (and don't write to your array while doing so). I would then move your 1T Samsungs to the 2310 and do a preclear on all of them at the same time. After that, write to those disk shares using Teracopy until you trust them.

To reorganize data, move the drives to the ports you want and then do a "trust my parity" (see wiki). I've done this lots of times. Just make sure you use the correct drive for parity. Immediately after do a parity check. I would do this before any of your other operations.

Good luck

November 27, 201015 yr

Author

Thanks a lot ohlwiler! Looks like I have some days of work cut out for me.

I do agree on the drive front, it's just I only want the two 1TB drives in my HTPC moving forward and short of selling them... may as well put them somewhere.

November 27, 201015 yr

I know how it is, I have five 1T. drives sitting on the shelf unused right now. Instead of the amount that I paid, I like to think of them as how much they are worth - it doesn't seem so wasteful. They also gave many hours of good service before they were benched.

November 28, 201015 yr

I'm starting out new and would like to buy a few 2T drives, what do you suggest for high quality?

Thanks,

Tom

November 28, 201015 yr

Buy what is cheapest. I have a Hitachi, 3 Samsung F3s, 4 WD EADs, 3 WD EARs and 7 Seagate LPs that I've acquired over the last year. I've always just bought what was on sale. I've had one Seagate that I removed because of escalating reallocated sectors and one EADs that failed smart while I was preclearing. Between all of my drives I have one reallocated sector. Don't buy all of your drives in one batch, exercise them well before deploying and watch them close for reallocated sectors.

November 28, 201015 yr

Author

Sounds like good advice. I know the preclear process shows reallocated sectors but how can you monitor that ongoing?

November 28, 201015 yr

The easiest way is to install unMENU and then access myMain then Smart tab (or sm or sh for each drive).

http://lime-technology.com/forum/index.php?topic=5568.0

December 5, 201015 yr

Author

Thanks for the suggestion. I really should - just been putting it off as things were working.

Now they are not. My re-org mission got derailed slightly. Here is what I did:

1) Installed a 2TB Seagate LP drive, initiated preclear.

2) few days later (lost telnet connection) but found no preclear signature on that drive. Ok,

3) Reboot system.

Now it's in an endless loop, reporting ata4 errors. Something like Error: error: read dma ext. It just loops through over and over. Hrm, while typing this it looks like it finally gave up looping and I might be able to log in and have a look. Farthest I've gotten all day.

Edit: syslog attached. I don't know which drive it's referencing with "ata4" I only thought they were addressed with disk x or sdx/hdx.

After doing SMART test on each drive - while they all passed - I think I found the culprit: my cache drive.

SMART results:

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST3120026A
Serial Number:    3LJ18F2M
Firmware Version: 3.54
User Capacity:    120,034,123,776 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Sat Dec  4 21:47:20 2010 GMT+5
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   065   060   006    Pre-fail  Always       -       43431990
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1009
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       692233027
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13952
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1318
194 Temperature_Celsius     0x0022   026   051   000    Old_age   Always       -       26
195 Hardware_ECC_Recovered  0x001a   065   059   000    Old_age   Always       -       43431990
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 36 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 36 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 00 00 00 50  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.860  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.195  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:41.530  READ DMA EXT

Error 35 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 00 00 00 50  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.860  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.195  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:41.530  READ DMA EXT

Error 34 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 00 00 00 50  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.860  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.195  READ DMA EXT
  ef 03 40 00 00 00 10 00      00:00:41.530  SET FEATURES [set transfer mode]

Error 33 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 00 00 00 50  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:42.860  READ DMA EXT
  ef 03 40 00 00 00 10 00      00:00:42.195  SET FEATURES [set transfer mode]
  25 03 01 00 00 00 50 00      00:00:41.530  READ DMA EXT

Error 32 occurred at disk power-on lifetime: 10492 hours (437 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 00 00 00 50  Error: ICRC, ABRT 1 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:43.526  READ DMA EXT
  ef 03 40 00 00 00 10 00      00:00:42.860  SET FEATURES [set transfer mode]
  25 03 01 00 00 00 50 00      00:00:42.195  READ DMA EXT
  25 03 01 00 00 00 50 00      00:00:41.530  READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

syslog-12-4-10.txt

December 5, 201015 yr

Buy what is cheapest. I have a Hitachi, 3 Samsung F3s, 4 WD EADs, 3 WD EARs and 7 Seagate LPs that I've acquired over the last year. I've always just bought what was on sale. I've had one Seagate that I removed because of escalating reallocated sectors and one EADs that failed smart while I was preclearing. Between all of my drives I have one reallocated sector. Don't buy all of your drives in one batch, exercise them well before deploying and watch them close for reallocated sectors.

Ok, ended up buying (5) 2TB EARS, and (1) 1TB Caviar Black drive for cache. The drives have build dates between April and November.

December 5, 201015 yr

Author

Ok, so I've concluded the only drive sporting SMART issues and errors was my cache drive. Unfortunately, removing did not rid the system of all the ata4 errors while booting unRaid.

How can I identify which drive it's referring to by ata4?

Edit: I found this command; interesting results?

Linux 2.6.32.9-unRAID.

root@Slipstream:~# dmesg|grep SATA|grep link

ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

ata4: limiting SATA link speed to 1.5 Gbps

ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

December 5, 201015 yr

Ok, so I've concluded the only drive sporting SMART issues and errors was my cache drive. Unfortunately, removing did not rid the system of all the ata4 errors while booting unRaid.

How can I identify which drive it's referring to by ata4?

Dec 4 21:04:11 Slipstream kernel: ata4.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133

Dec 4 21:04:11 Slipstream kernel: ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), A

December 5, 201015 yr

Author

Ouch. So my disk3, that makes sense... ata4.

That just so happens to be my drive with the most data on it - only 400MB left! (Not sure how that happened, thought it was configured to fill up to 10GB free).

So what do you recommend Joe? I look at this and it "looks" bad to me. I have a brand new Seagate 2TB LP drive that is finishing up step 2 of the preclear script and I've previously run it through a bunch of benchmarks / data copy tests so it looks ready to go.

I last did a parity check 11/30 with 1 error. I guess once the other drive is done pre-clearing, I'll pull it and trust parity to rebuild on the Seagate then RMA the Green.

December 6, 201015 yr

Author

The preclear on my brand new Seagate 2TB LP drive is done after 30 hours and I don't like the results I see:

 S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
54c54
<   1 Raw_Read_Error_Rate     0x000f   117   100   006    Pre-fail  Always       -       163262430
---
>   1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       186361218
58c58
<   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       339601
---
>   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       499541
66,67c66,67
< 190 Airflow_Temperature_Cel 0x0022   078   066   045    Old_age   Always       -       22 (Lifetime Min/Max 18/22)
< 195 Hardware_ECC_Recovered  0x001a   049   047   000    Old_age   Always
---
> 190 Airflow_Temperature_Cel 0x0022   071   066   045    Old_age   Always       -       29 (Lifetime Min/Max 18/31)
> 195 Hardware_ECC_Recovered  0x001a   049   046   000    Old_age   Always
71,73c71,73
< 240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       211922276319277
< 241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       299286567
< 242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       464660976
---
> 240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       239186728714315
> 241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       4180304637
> 242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       3763462133
============================================================================

Some of those numbers seem crazy high!! What should I do with it? And any suggestions on what I should do with the WD Green drive that's causing all the errors?Might WD Diagnostics help or are the errors enough ground to RMA the drive?

December 7, 201015 yr

Most of the numbers in the "raw" column mean anything to anybody except the manufacturer.

You should look ONLY at the Current value vs. the Threshold value.

Your disk is perfectly healthy, in fact, the normalized "read error rate" improved over the course of the pre-clear.

Joe L.

December 7, 201015 yr

Author

Hi Joe,

Thanks for the reassurance. The numbers seemed out of whack. I thought I had read the opposite at one point though. Clearly not.

I've successfully replaced the failing EARS with the LP drive and data is (slowly) rebuilding now.

Going to run some tests on the EARS and see if I can't get an RMA then get back to my data re-org!

December 7, 201015 yr

You might have suspected the numbers were not meaningful when you saw the

head-flying-hours counter = 239186728714315

If that was hours, that would be equal to 27,304,421,086 years.

If it were millionths of a second it would be 239186728 seconds

239186728 seconds = 3986445.466 minutes

3986445.466 minutes = 66440.75 hours

66440.75 hours = 2768.36 days

2768.36 days = 7.58 years.

I suspect neither 27,304,421,086 years or 7.58 years is correct.

Joe L.

December 7, 201015 yr

It could be billionths of a second.

December 8, 201015 yr

Author

I did see that and assumed it wasn't being read right - hence, the drive must be erroneous.

So to the EARS that crapped out on me... I ran WD's Lifeguard software on it.

Quick test - Passed

Extended test - Passed

Then I tried to write zeros to the drive and it looks like it completed but finally with an error saying that it "failed up update disk property!" Here's to hoping that's enough to stand on for an RMA. The smart readouts from the other day were fine for this drive as well.

Edit - it appears it ran a parity check after the rebuild was done and found one error again. I've started one more parity check and a few minutes in it's already found 2 errors. I'm not liking this trend...

Will post syslog when done.

December 8, 201015 yr

Author

So here's the syslog which should include the last parity check which still found the handful of errors after the EARS was replaced.

My next step is going to be to remove that aged IDE drive I had suspected from the get go then run the parity check again.

Edit: I have now removed the aging WD80GB drive and the writing of the parity went flawlessly. Now I'm running yet another parity check to hopefully verify all is well. Next up: RMA the Green drive and add my Raptor as the cache drive.

syslog-2010-12-8.txt

Huge storage re-org - guidance needed.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)