Multiple Issues! What would you do?


Recommended Posts

So I was on 5.0.5 and had to upgrade my motherboard. unRAID 5 doesn't recognize the NIC, but unRAID6 does (this is ridiculous by the way.) The reason I needed to upgrade was because a bad sata port had caused multiple drive failures. What I'm left with is a bad parity, a drive with a red ball status, and upgrade instructions that tell me not to upgrade if I have a faulty drive or bad parity. I can back up what little is on the "faulty" disk no problem, so my question is this... What would you do?

 

Would it work to back up what is on that "Faulty" drive on another machine, upgrade to 6 and run that script that makes all the drives look new so that parity runs again, and then preclear that "faulty drive once again and add it back into the array and move all my crap back? This whole fiasco would be way simpler if we could add certain driver types easier.

Link to comment

I think you have the basic idea right, except instead of a "script", there is Tools - New Config in the webGUI.

 

How exactly do you plan to backup the drive? Just copy the files off it? Unless the PC you are going to copy it on is running Linux with RFS support you will have to jump through some hoops to read it.

 

Link to comment

I think you have the basic idea right, except instead of a "script", there is Tools - New Config in the webGUI.

 

How exactly do you plan to backup the drive? Just copy the files off it? Unless the PC you are going to copy it on is running Linux with RFS support you will have to jump through some hoops to read it.

 

Really? Well crap... I dunno what to do about those files then. I just assumed if I pulled the disk out and hooked it up I could read from it, lol. I wonder if I change over to 6 and then spin it up without that disk in the array if I can still copy it via a browser since they're all captured in parity... even when that disk was totally removed from the machine, it still showed "Disk 3" in a browser and I could copy stuff.

Link to comment

Your first post in this thread said you had bad parity. What did you mean by that? Does unRAID think you have valid parity or not?

 

If you have valid parity and just a redballed data drive then you can read that data anyway. unRAID will not even try to use a redballed disk. Any access to that disk is being done by emulating it from parity and the other data drives.

Link to comment

Your first post in this thread said you had bad parity. What did you mean by that? Does unRAID think you have valid parity or not?

 

If you have valid parity and just a redballed data drive then you can read that data anyway. unRAID will not even try to use a redballed disk. Any access to that disk is being done by emulating it from parity and the other data drives.

 

Sorry... I'm still new to most of this, as my array has always just worked for the most part, so I tend to make some assumptive comments. It said something under parity from the dashboard like "invalid data" or "bad data", but it showed that faulty disk's file information when I spun up my array.

Link to comment

Your first post in this thread said you had bad parity. What did you mean by that? Does unRAID think you have valid parity or not?

 

If you have valid parity and just a redballed data drive then you can read that data anyway. unRAID will not even try to use a redballed disk. Any access to that disk is being done by emulating it from parity and the other data drives.

 

Sorry... I'm still new to most of this, as my array has always just worked for the most part, so I tend to make some assumptive comments. It said something under parity from the dashboard like "invalid data" or "bad data", but it showed that faulty disk's file information when I spun up my array.

Still not clear. Maybe a screenshot
Link to comment

Still not clear. Maybe a screenshot

 

Okay, so this is what I have going on so far... I just went ahead and spun it up on unRAID 6 and prayed nothing crazy happened... I guess it's kind of a mute point now. I got the array running, I just need to back up disk 3 and the preclear it again.

 

Now I'm having a weird issue where if I don't boot in safemode I can't connect via the web interface, but I think I'm going to lick that one on my own next boot.

Disk_3.JPG.12e0c3790ce94994be817365f40dbb92.JPG

Dash_Spun_up.JPG.5bab61d6e1fc7b2cfb7b060110797127.JPG

Link to comment

So i let it go ahead and try to rebuild. It failed and showed the disk as faulty. Then "preclear_disk.sh -l" showed it as /dev/sdg instead of the /dev/sdf it has been. I restarted and did a "preclear_disk.sh -l", and now that's returning "No un-assigned disks detected"... I am crazy tired. Ugh.

If the drive letter changed, then that implies that the disk dropped offline (explaining the write failure) and then cam back again.  The most likely cause for this would be a power or SATA cable not securely seated.

 

As to the pre-clear issue, the pre-clear script will not let you pre-clear the disk if it is currently assigned to the array.  If you want to rerun the pre-clear script against the disk as a confidence check then you will first have to unassign it from the array.  Note that it is not necessary for a disk to be pre-cleared before it is rebuilt, but in your case where it seems a write failure occurred it might not be a bad idea to check the disk. 

 

Posting the SMART report for the drive and the syslog covering the failed rebuild might allow others to give some guidance as to what went wrong.  This might save you pre-clearing the disk again which is a lengthy process.

Link to comment

So I'm done with preclear. Ran my SMART report. How am I looking? I'm going to remove the power connector I have plugged in and plug in another.

 

Preclear:

================================================================== 1.14
=                unRAID server Pre-Clear disk /dev/sdf
=               cycle 1 of 1, partition start on sector 1
= Disk Pre-Clear-Read completed                                 DONE
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Verifying if the MBR is cleared.              DONE
= Disk Post-Clear-Read completed                                DONE
Disk Temperature: 38C, Elapsed Time:  20:06:16
========================================================================1.14
== ST3000DM001-1ER166   ZA500CBK
== Disk /dev/sdf has been successfully precleared
== with a starting sector of 1
============================================================================
** Changed attributes in files: /tmp/smart_start_sdf  /tmp/smart_finish_sdf
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   115     118            6        ok          91180384
             Spin_Up_Time =    98      95            0        ok          0
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
          High_Fly_Writes =    98     100            0        ok          2
  Airflow_Temperature_Cel =    62      68           45        near_thresh 38
      Temperature_Celsius =    38      32            0        ok          38
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.

 

SMART Report:

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    ZA500CBK
LU WWN Device Id: 5 000c50 07a83a9d8
Firmware Version: CC25
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 28 01:14:53 2015 CST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 320) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       91180384
  3 Spin_Up_Time            0x0003   098   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       851
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       681610
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       78
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       21
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   066   060   045    Old_age   Always       -       34 (Min/Max 34/38)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       837
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       898
194 Temperature_Celsius     0x0022   034   040   000    Old_age   Always       -       34 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       52h+04m+29.719s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       11986356768
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       34252710336

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Edit:

Don't know if this matters, but this is the power supply I'm using. Aside from the CPU and MOB, it's powering 1 SSD, and 4 HDDs.

Link to comment

Your SMART report and pre-clear results look fine.  You should be able to add this disk back into your array now with no problem.    Hopefully the issue was just cabling, and new cables will keep everything running fine.  But I'd watch it carefully for a while, and be sure your backups are up-to-date so if anything DOES go awry you won't lose any data.

 

I presume you copied all the data off of the drive before pre-clearing it -- right?    You mentioned that you planned to do that, but then you were pre-clearing the drive after you'd moved to v6.

 

Link to comment

Your SMART report and preclear results look fine.  You should be able to add this disk back into your array now with no problem.    Hopefully the issue was just cabling, and new cables will keep everything running fine.  But I'd watch it carefully for a while, and be sure your backups are up-to-date so if anything DOES go awry you won't lose any data.

 

I presume you copied all the data off of the drive before preclearing it -- right?    You mentioned that you planned to do that, but then you were pre-clearing the drive after you'd moved to v6.

 

I am super confused... That disk disappeared again. I clicked the drop down box to add it as disk 3, and then it disappeared. The selection read "unassigned". When I clicked it again there wasn't anything there to select. I went ahead and started up the array, and "Disk 3" shows up i the file system browsing from Windows. I assumed this is because the parity has the file info, and I was about to start backing up stuff from there... I looked at the log, and it shows the mover has started, an then a NEW file pops up in Disk 3 that has been moved over from the Cache.

Link to comment

I am super confused... That disk disappeared again. I clicked the drop down box to add it as disk 3, and then it disappeared. The selection read "unassigned". When I clicked it again there wasn't anything there to select. I went ahead and started up the array, and "Disk 3" shows up i the file system browsing from Windows. I assumed this is because the parity has the file info, and I was about to start backing up stuff from there... I looked at the log, and it shows the mover has started, an then a NEW file pops up in Disk 3 that has been moved over from the Cache.

If the drive is not appearing in the drop down when you try to assign it to disk 3 (I assume the array is stopped at the time) then that means it is not being seen at the Linux level.  This is a bit strange if you have just successfully pre-cleared it.  Normally it would appear in the drop down list so that you can assign the drive to disk 3.

 

At the share level what you see is expected behaviour as Disk 3 is being emulated by the combination of the other drives plus parity if the physical disk is missing.

 

Link to comment

...  "Disk 3" shows up i the file system browsing from Windows. I assumed this is because the parity has the file info ...

 

The parity drive has NO file info ... it only contains the a single XOR'd parity bit for each "slice" of data across all of the data disks.  A missing disk (i.e. disk3) is reconstructed by examining the corresponding bits for ALL of the other data disks PLUS the parity disk ... which lets the system computationally determine what bit was on the missing disk at that point.

 

 

...  I was about to start backing up stuff from there...

 

That answers my earlier question r.e. whether you'd backed up the drive before you pre-cleared it !!  While you can still back it up by copying from the array (the disk will be emulated as I just noted by spinning up all of the other drives in your array) -- and you definitely SHOULD do that BEFORE doing anything else -- you really should have done that BEFORE you did anything to the failed disk, so if anything goes wrong with the emulation you could have tried reading the data directly from the disk.

 

 

Link to comment

So I was about to post in reply, and add some screenshots and info, and then this happened!! I am about to pull all my hair out.

 

Here's a SMART report to go with... I have now changed power cable, sata port, sata cable, and this is the third disk I've tried to add.

root@Tower:~# smartctl -a /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.18.5-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001
Serial Number:    ZA500CBK
LU WWN Device Id: 5 000c50 07a83a9d8
Firmware Version: CC25
User Capacity:    137,438,952,960 bytes [137 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s
Local Time is:    Sat Feb 28 12:57:24 2015 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Read SMART Data failed: scsi error aborted command

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

Read SMART Log Directory failed: scsi error aborted command

Read SMART Error Log failed: scsi error aborted command

Read SMART Self-test Log failed: scsi error aborted command

Selective Self-tests/Logging not supported

New_weird_issue.png.96ea956662b743493af76bb00b0e2032.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.