Support for v4.5.4


oconnellc

Recommended Posts

A smart test is telling me that my parity disk is failing (I have unraid 4.5.4.  Yeah, I know, I have been busy with life for a few years.  Since this thing just tends to 'work', it doesn't get any sort of attention).  It was pretty much luck that I even spotted this at all.  Anyway, my drives (including parity) are all still showing green and no errors, so I'm not necessarily afraid of anything crashing today, but I'd like to fix this as soon as possible (luckily, I happen to have a spare drive, already cleared, just sitting in the case, ready to be plugged in).

 

Anyway, I can't find anything that clearly shows the steps to replace the parity drive.  I suspect that they are:

 

1) Document all drive assignments

2) Shutdown server

3) Connect parity replacement.

4) Restart server

5) Stop the array

6) Update configuration, changing the parity assignment from the old drive to the new drive. 

7) Verify all other drive assignments to make sure user didn't do something stupid.

8) Profit?

 

Any thoughts would be greatly appreciated.  I know, once I get things back in order, I need to upgrade to a more recent version...

 

Thanks in advance for any help.

 

Chris

Link to comment

FWIW, here is the report that is telling me that the drive is dying:

 

=== START OF INFORMATION SECTION ===

Device Model:    SAMSUNG HD203WI

Serial Number:    S1UYJ1CZ101535

Firmware Version: 1AN10002

User Capacity:    2,000,398,934,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  Not recognized. Minor revision code: 0x28

Local Time is:    Sun Sep 14 18:05:29 2014 GMT+5

 

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

 

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

See vendor-specific Attribute list for failed Attributes.

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:      ( 137) The previous self-test completed having

a test element that failed and the

device is suspected of having handling

damage.

Total time to complete Offline

data collection: (25140) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

SCT capabilities:       (0x003f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  001  001  051    Pre-fail  Always  FAILING_NOW 35728

  2 Throughput_Performance  0x0026  252  252  000    Old_age  Always      -      0

  3 Spin_Up_Time            0x0023  061  060  025    Pre-fail  Always      -      11910

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      616

  5 Reallocated_Sector_Ct  0x0033  252  252  010    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  252  252  051    Old_age  Always      -      0

  8 Seek_Time_Performance  0x0024  252  252  015    Old_age  Offline      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      4603

10 Spin_Retry_Count        0x0032  252  252  051    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  252  252  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      47

191 G-Sense_Error_Rate      0x0022  252  252  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0022  252  252  000    Old_age  Always      -      0

194 Temperature_Celsius    0x0002  064  060  000    Old_age  Always      -      34 (Lifetime Min/Max 14/40)

195 Hardware_ECC_Recovered  0x003a  100  100  000    Old_age  Always      -      0

196 Reallocated_Event_Count 0x0032  252  252  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  252  252  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  252  252  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0036  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      0

223 Load_Retry_Count        0x0032  252  252  000    Old_age  Always      -      0

225 Load_Cycle_Count        0x0032  098  098  000    Old_age  Always      -      21111

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed: handling damage??  90%      4603        -

# 2  Short offline      Completed: handling damage??  90%      4599        -

# 3  Short offline      Completed: handling damage??  90%      4598        -

 

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Completed_handling_damage?? [90% left] (0-65535)

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

Link to comment

When was your last parity check?

 

Without the parity check you run the risk of replacing the "wrong drive." If during the parity rebuild another drive generates an error, what do you do then? Running a parity check confirms all the other drives are readable, making the parity replacement less risky.

 

The upgrade is very fast.

 

You have one drive which has not reported any errors, but only run short self tests. One value is out of range, but no errors yet. How many and what condition are the rest of your drives?

 

The best that can happen, the parity check runs without errors in unRAID or SMART, you probably don't need to replace anything.

The worse case you find that multi drives have read errors, you'll want the parity drive unchanged.

 

Link to comment

Last parity check was, worst case, Sept 1.  I think it ran again after we lost power and my wife turned off the UPS because it was beeping (sigh).  The quick SMART test on the other drives looked good.  I'm not surprised the parity is dying,  it is probably 4 or 5 years old.  I know I need to upgrade,  I just figured that the parity replace was a fairly quick and easy replace.  Hadn't occurred to me to be worried about the state of parity,  but since it was so recent and I know I haven't copied anything to the drive in some time.

Link to comment

Just my opinion, but I think there is little chance a parity check could proceed very far at all.  The Raw Read Error Rate is as low as it can get, far below threshold, so I'm not sure the drive can read ANY sectors on it.  I do respect the advice of others here, and this is a rare situation, but in my opinion, you are probably better off pulling the drive immediately, and replacing it now, and starting a parity build.  If for some reason, the parity rebuild fails, you can *try* to put the failed one back online, run the Trust My Array procedure, and *try* rebuilding some other drive, but I don't think there is much chance at all of it being able to run a full rebuild, as bad as it is.  The read error rate is a critical quantity, and its failure probably means a mechanical failure of some kind.  Are you hearing a whine or other noise?

Link to comment

RobJ thanks for the advice.  I don't really know what I'm looking at when I read the SMART report, but I recognize the word "fail" when I see it.  I'm pretty sure the upgrade to 5.X requires first moving to 4.7, which you have to request from Tom.  I'll swap out the parity while waiting for my response from support and then do the upgrade once I get all the files I need.  You are actually making me more paranoid than I was.  The drive isn't making any noise that I can hear and like I mentioned,  the last parity check was only a couple weeks or so ago and it reported no errors.

Link to comment

RobJ thanks for the advice.  I don't really know what I'm looking at when I read the SMART report, but I recognize the word "fail" when I see it.  I'm pretty sure the upgrade to 5.X requires first moving to 4.7, which you have to request from Tom.  I'll swap out the parity while waiting for my response from support and then do the upgrade once I get all the files I need.  You are actually making me more paranoid than I was.  The drive isn't making any noise that I can hear and like I mentioned,  the last parity check was only a couple weeks or so ago and it reported no errors.

 

If you know all your drive assignments you can do a clean install of 5.0.5 and skip 4.7. Really you only care about making sure you know your parity drive, but in your case that should be easy. :)

 

You can backup your current USB, then reformat, put a clean 5.0.5 image on and then run make_bootable. Let UnRAID load up, and reassign disks.

 

There are files you can copy back from your original usb if needed. These will all be in the config folder:

 

ident.cfg - your hostname if not tower

network.cfg - any static IP settings (i.e. if you are not dhcp)

share.cfg - your default share settings

pro.key or plus.key - your license file

 

There may also be a /config/shares folder with specifics on each share - or you can re-create them.

 

If you are using any plugins you will likely want to update them as well (look for the plugins by PhAzE in the forums here).

 

Regardless it's pretty easy to skip 4.7 and do a clean 5.0.5 install. This also has the benefit of giving you a clean slate to start from and make sure all your addins/tools are current.

 

Just be aware that if you used SimpleFeatures that is no longer supported in 5.0.

 

Link to comment

RobJ thanks for the advice.  I don't really know what I'm looking at when I read the SMART report, but I recognize the word "fail" when I see it.  I'm pretty sure the upgrade to 5.X requires first moving to 4.7, which you have to request from Tom.  I'll swap out the parity while waiting for my response from support and then do the upgrade once I get all the files I need.  You are actually making me more paranoid than I was.  The drive isn't making any noise that I can hear and like I mentioned,  the last parity check was only a couple weeks or so ago and it reported no errors.

 

If you know all your drive assignments you can do a clean install of 5.0.5 and skip 4.7. Really you only care about making sure you know your parity drive, but in your case that should be easy. :)

 

You can backup your current USB, then reformat, put a clean 5.0.5 image on and then run make_bootable. Let UnRAID load up, and reassign disks.

 

There are files you can copy back from your original usb if needed. These will all be in the config folder:

 

ident.cfg - your hostname if not tower

network.cfg - any static IP settings (i.e. if you are not dhcp)

share.cfg - your default share settings

pro.key or plus.key - your license file

 

There may also be a /config/shares folder with specifics on each share - or you can re-create them.

 

If you are using any plugins you will likely want to update them as well (look for the plugins by PhAzE in the forums here).

 

Regardless it's pretty easy to skip 4.7 and do a clean 5.0.5 install. This also has the benefit of giving you a clean slate to start from and make sure all your addins/tools are current.

 

Just be aware that if you used SimpleFeatures that is no longer supported in 5.0.

I wonder why Tom says you must upgrade to 4.7 first? I have never used anything before 4.7, so I don't know if there is anything that might bite you if you try to go directly to 5 with data from 4.5.4 or not.

 

I'm pretty sure that the "plugin" concept was actually introduced during the v5 betas, as was Simplefeatures, so he's got nothing to worry about there. Probably unMenu and its packages were around back then, though.

Link to comment

I wonder why Tom says you must upgrade to 4.7 first? I have never used anything before 4.7, so I don't know if there is anything that might bite you if you try to go directly to 5 with data from 4.5.4 or not.

 

I'm pretty sure that the "plugin" concept was actually introduced during the v5 betas, as was Simplefeatures, so he's got nothing to worry about there. Probably unMenu and its packages were around back then, though.

 

I started on the 5.0 betas so don't know either. Also, I've been using 6.0 for almost a year now, so am getting hazy on the layout for 5.0 nowadays.

 

Regardless, if you are running a version that old, chances are you haven't touched/updated it in years, so I would think a clean installation/fresh plugin install is likely a good idea anyways. Having to spend a couple hours every 3-4 years is not a bad time investment. :)

 

Link to comment
I wonder why Tom says you must upgrade to 4.7 first? I have never used anything before 4.7, so I don't know if there is anything that might bite you if you try to go directly to 5 with data from 4.5.4 or not.
My memory is a little hazy, but I'm pretty sure there was an issue that could cause a drive to not mount correctly because the partition wasn't at the expected start sector. Also, I think the way HPA was dealt with changed with 4.7, so if any drives have HPA it might cause an issue.

 

The symptom would be that the drive would show unformatted.

Link to comment
There are files you can copy back from your original usb if needed. These will all be in the config folder:

 

ident.cfg - your hostname if not tower

network.cfg - any static IP settings (i.e. if you are not dhcp)

share.cfg - your default share settings

pro.key or plus.key - your license file

 

There may also be a /config/shares folder with specifics on each share - or you can re-create them.

Are you sure the format hasn't changed in any of the cfg files? It may be safer to recreate them using the new gui rather than copying them across. I can't remember one way or the other whether it's safe to copy from a version that old.
Link to comment

There are files you can copy back from your original usb if needed. These will all be in the config folder:

 

ident.cfg - your hostname if not tower

network.cfg - any static IP settings (i.e. if you are not dhcp)

share.cfg - your default share settings

pro.key or plus.key - your license file

 

There may also be a /config/shares folder with specifics on each share - or you can re-create them.

Are you sure the format hasn't changed in any of the cfg files? It may be safer to recreate them using the new gui rather than copying them across. I can't remember one way or the other whether it's safe to copy from a version that old.

 

Good point. I honestly don't know either. I agree the safest method would be to re-create them, and just copy over the license key.

 

oconnellc - Instead of copying the files, they can just be opened in Notepad so that you can copy the existing settings to make sure everything is set up the same way.

Link to comment

I'm going to wait to hear back from support regarding 4.7.  In the meantime, I'm not sure what would prevent me from just swapping out the parity disk.  I have run parity recently with no errors.  I also took a screen shot with the drive assignments, so I can easily replicate that if necessary.  However, I don't anticipate any problems.  I'm going to just plug in the 'cold' drive that is already mounted in the machine and then turn it back on and change the assignment of the parity disk.  After rebuilding parity, I'll at least feel safe enough to leave the server on so I can start using it again and then upgrade as quickly as life allows.  Agreed that a few hours of investment every few years is a small price to pay for what I get out of this thing.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.