multiple drive issues


Recommended Posts

Yes, we need each disk assigned to its proper slot. I'd do a new config, assign each disk to its proper slot, and trust parity. Array should come up with minimal I/O. See if disk 12 is readable. If it is, stop the array, unassign disk7, and start array. Disk7 will now be simulated and see if you can access it. If so, assign your new disk to slot 7 and rebuild it.

 

Sorry but going to sleep. Early morning. Will check in progress in the morning.

Link to comment

Im waiting to hear from anyone else on the best course of action. If i dont hear anything within the hour then i will move forward with bjp999's suggestion

Seems like the best idea to me. If it doesn't work you will still have the original disk7 and nothing will be changed on any of the other disks.

 

Take it one sentence at a time, and if something doesn't work like he said, stop and report back.

Link to comment

Yup. In any unraid failure case, user must first assess the likelihood of rescuing the array before resorting to per-disk recovery attempt. If the array can still be used (as per bjp999's suggestion), then all you need to do is plop in a new disk and it will be rebuild from the parity data.

 

If that doesn't work, then you'll need to proceed to the per-disk rescue attempt...

 

I've had issues with a drive very similar to what you have there (also shows up as 600 Petabyte drive) and what was special about that drive of mine, was that it messed up other drives in the SATA/SAS bus as well. It caused 'Command Timeout' SMART errors and generated (as far as I know) phantom 'Pending Reallocation Errors' which has since disappeared without trace after subsequent tests.

Link to comment

Followed bjp999's instructions: I was able to view data on drive 12 through the dashboard so i stopped the array, removed disk 7, started in maintenance mode, stopped, assigned a new disk to disk 7 and started the array. I have been waiting at Mounting Drives now for about 10 minutes so hopefully that clears up soon. If not, whats the next step?

Link to comment

Followed bjp999's instructions: I was able to view data on drive 12 through the dashboard so i stopped the array, removed disk 7, started in maintenance mode, stopped, assigned a new disk to disk 7 and started the array. I have been waiting at Mounting Drives now for about 10 minutes so hopefully that clears up soon. If not, whats the next step?

When you started the array did unRAID indicate that it was going to rebuild?
Link to comment

Scratch that - its doing a data rebuild on 7 now.

 

Next question, since it appears that 12 is working as it is supposed to and there were no smart errors, what is everyone's opinion on it moving forward. I think there are only 3 options here

 

1. I have 1 more spare drive, i could always swap 12  out for the new spare and data rebuild again. After that i could pre-clear the old 12 and then run some tests on it, if its good i put it back in.

 

2. I smash it with a hammer....

 

3. Leave it alone in the array

Link to comment

While option 2 is tempting, I don't recommend it  :)

 

I presume that you're using one of your spare drives to do the disk #7 rebuild -- in which case you still have the failed drive ... is that correct?

If so, do NOT do anything with the old drive until the rebuild completes and you run a non-correcting parity check on the array to confirm the rebuild went okay.

 

As for disk #12 => give the intermittent "glitches" you had with it, I'd do the same thing with it after disk #7 is recovered.  i.e. rebuild it onto your other spare => THEN you can thoroughly test #12 to see if you're satisfied that it's in good enough shape to use in your array.

 

Link to comment

Scratch that - its doing a data rebuild on 7 now.

 

Next question, since it appears that 12 is working as it is supposed to and there were no smart errors, what is everyone's opinion on it moving forward. I think there are only 3 options here

 

1. I have 1 more spare drive, i could always swap 12  out for the new spare and data rebuild again. After that i could pre-clear the old 12 and then run some tests on it, if its good i put it back in.

 

2. I smash it with a hammer....

 

3. Leave it alone in the array

 

Glad making good progress. Not sure you followed my directions exactly (did you see the data on disk7 before beginning the rebuild?). Even with the rebuild in progress, you should still be able to see the contents of disk7. Either way - do NOT stop the rebuild!

 

Assuming all this works fine, I'd go with #3. The smart report on disk12 is not showing any problems. I would attribute the weirdness to the failing disk7, loose cabling or possibly the dreaded user error :o. Doing #1 is not a bad idea, but honestly, if the rebuild of disk7 works fine, I'd just leave it alone.

Link to comment

Disk 12 looks OK, I wouldn't do anything with it unless something shows up in syslog (ATA errors) or the smartctl report.


Device Model:     WDC WD40EZRX-00SPEB0
Serial Number:    WD-WCC4E3KJKVLK...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   207   180   021    Pre-fail  Always       -       6650
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       20
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1177
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       16
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   177   177   000    Old_age   Always       -       69158
194 Temperature_Celsius     0x0022   123   100   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0\
...

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

I would issue a smart long test and let it run to completion to verify the surface and put a mark in the sand so to speak within the smart logs.

But only after the rebuild is complete.

Keep in mind that the spin down timers may affect a long test and will need to be disabled for the drive.

After the smart long test, post it the smart log for assistance in reassessing it's status.

Link to comment

Since it looks like we are making progress, and garycase hasn't said anything about it yet, I will.

 

You should consider a backup strategy. It doesn't have to mean backing up the entire 32 TB. Just think about what data you can't afford to lose and have no way to recover if it were lost. Family photos and videos, other personal data. Maybe also consider what data you would hate to be without but might be able to recover from other sources if you went to a lot of trouble.

 

I currently have a 17TB array, but just over 11TB is used. On this array, I have images of other PCs, backups of photos and personal data that also exist on other PCs. I also have some data that doesn't exist on other PCs such as our music collection and assorted other media like movies, tv shows, audiobooks, ebooks.

 

I take a monthly backup from my unRAID to an external drive. Only 2TB for the irreplaceable photos and other personal data, and our music collection will also fit. The rest I am willing to sacrifice and recover from other sources. Each month I give my wife the external drive and she locks it in her desk at work. She gives me the previous month's external drive, and I do the same with it. So we have 2 offsite backups of the really important stuff.

 

The external drives I mentioned are actually just internal drives I re-used from upsizing disks in my array. I put them in an external USB3 enclosure and attach it directly to my unRAID to do the backups. It is all scripted so there is not much involved other than changing out the drives each month.

 

TL,DR; Think about what you really don't want to lose, and just back that up.

Link to comment

... Since it looks like we are making progress, and garycase hasn't said anything about it yet, I will.

 

:) :)  [Clearly I have a bit of a reputation r.e. backups  8) ]

 

 

You should consider a backup strategy. It doesn't have to mean backing up the entire 32 TB. Just think about what data you can't afford to lose and have no way to recover if it were lost. Family photos and videos, other personal data. Maybe also consider what data you would hate to be without but might be able to recover from other sources if you went to a lot of trouble.

 

...Think about what you really don't want to lose, and just back that up.

 

Well stated.  Backups are "insurance" for your data.  As with any insurance, you should have insurance for what you can't afford to lose ... and whether you choose to insure other things is a personal choice.  Simple rule (basically what trurl said above):  Assume your server catastrophically failed tonight.    Ask yourself what you'd be upset about losing.    Anything in that category should be backed up.

 

 

... The external drives I mentioned are actually just internal drives I re-used from upsizing disks in my array ...

 

I do exactly the same (have to many years).  Using older drives that were either replaced with larger ones, or had developed a few bad sectors and I didn't want to use them for primary storage again, significantly reduces the number of additional drives you actually have to buy for backups.

 

 

Link to comment

... Since it looks like we are making progress, and garycase hasn't said anything about it yet, I will.

 

:) :)  [Clearly I have a bit of a reputation r.e. backups  8) ]

 

I was going to make a lil joke out of this, but I held back.

FWIW, it's all good!

 

 

You should consider a backup strategy. It doesn't have to mean backing up the entire 32 TB. Just think about what data you can't afford to lose and have no way to recover if it were lost. Family photos and videos, other personal data. Maybe also consider what data you would hate to be without but might be able to recover from other sources if you went to a lot of trouble.

 

...Think about what you really don't want to lose, and just back that up.

 

Well stated.  Backups are "insurance" for your data.  As with any insurance, you should have insurance for what you can't afford to lose ... and whether you choose to insure other things is a personal choice.  Simple rule (basically what trurl said above):  Assume your server catastrophically failed tonight.    Ask yourself what you'd be upset about losing.    Anything in that category should be backed up.

 

Frankly, anything irreplaceable such as family documents, pictures, video, etc, should be backed up offsite.

Something could happen where you have an emergency and have to leave 'rapidly'.

Link to comment

... Frankly, anything irreplaceable such as family documents, pictures, video, etc, should be backed up offsite.

Something could happen where you have an emergency and have to leave 'rapidly'.

 

Excellent advice from the one forum member who can say that from very painful personal experience !!

 

... for anyone who doesn't know, WeeboTech was hit badly by Hurricane Sandy.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.