Jump to content

help rebuilding array with 2 failures (SOLVED)


Recommended Posts

Hey,

 

I have an unraid array with 5 disks:

 

Parity - new 8TB CMR Western Digital

Parity 2 - old 8TB SMR Seagate Barracuda

Disk 1 - old 8TB SMR Seagate Barracuda

Disk 2 - 8TB unassigned (device missing, contents emulated)

Disk 3 - old 8TB SMR Seagate Barracuda (device disabled, contents emulated)

 

I have 2 parity and I'm down 2 drives.

 

I don't have Disk 2 and can't put it back in.

 

I want to replace the old Seagate SMR drives and rebuild the array.

I have a healthy 8TB and a healthy 12TB; I also ordered another 8TB and another 12TB.

 

No, I don't have a backup.

Yes, I will enthusiastically back things up after this.

Thank you so much for helping.

 

What should I do?

 

Things I'm considering:

 

A) In the unraid gui: Replace Disk 1 and Disk 2 with new drives and rebuild. Replace Parity 2 with new drive and rebuild.

 

B) Outside of unraid: `dd` or `ddrescue` Parity 2 to a new drive and repeat for Disk 1 and then use unraid gui to replace Disk 2 and Disk 3 and rebuild array?

 

Something else?

 

Also, can I replace the 8TB Disk 2 or 8TB Disk 3 with a 12TB drive?

 

Also, can I replace the 8TB Parity 2 with a 12TB drive?

 

I read about "Parity Swap" where unraid will copy parity onto a bigger drive and move the old smaller parity drive into the array. I'm not sure that's what I want. If I did this for Parity 2 I wouldn't want it in the array. Also, the instructions for it seemed tricky/finicky.

 

I just realized that I'm implying something but haven't communicated it. Namely, I'm concerned about the remaining two old Seagate SMR drives failing. I just read about SMR and all that funny business. I don't want to write to any of the old Seagate SMR drives; I think I shouldn't need to do anything more than a single simple read from the old drive to clone to a new drive.

 

Thanks a million. I'm sweating.

image.png

Edited by NominallySavvyTechPerson
Link to comment

You cannot use 12TB disks for data until all parity drives are at least 12TB in size.    However you need to decide if going forward you want 2 parity drives with only a small number of data drives.

 

the simplest first step would be to simply replace the missing 8TB disk with the good 8TB drive and rebuild its contents.  At that point you still have a disabled disk3

 

You could then use parity swap process to get parity1 upgraded to 12TB and replace disk3 with the old parity1 drive.   At that point you should be back to having no disabled drives and with 2 valid parity drives.    You could then decide how you want to end up to plan out next stages.

Link to comment

I like the idea of adding a healthy 8TB in place of the missing drive and rebuilding the array but I'm nervous about another drive failing during either that rebuild or the subsequent rebuild. Can/should I add two healthy 8TB at once and rebuild the array to eliminate a subsequent rebuild and minimize wear on the old SMR Seagate drives? Does it even work like that?

 

After replacing the missing 8TB disk, can I do the parity swap to install the 12TB parity without swapping that Seagate SMR drive into the array as a data drive? I want to remove all the old SMR drives and I don't trust that I can write to them without having them fail.

 

I found conflicting comments on the internet about whether or not I could use a data drive bigger than parity. Thanks for clarifying that.

 

Thank you for your help!

Link to comment
1 hour ago, NominallySavvyTechPerson said:

I have a healthy 8TB and a healthy 12TB; I also ordered another 8TB and another 12TB.

 

Sorry; I do only have one 8TB but I'm happy to wait until I have two 8TB if that's a better option. I think it's a better option to rebuild with two 8TB at once because that's less wear on the old SMR Seagate drives. I'm planning to run badblocks -wsv on the new 8TB when I get it before I put it in the array.

 

I'm elated to hear that I'll be able to rebuild both disks at once. I didn't know that was an option. This sounds like the best path forward.

 

Is there an option for unraid to disallow writing to the array during the rebuild? I have disabled docker and commented out any cron jobs but I want to take all available precautions.

 

I'm asking this in advance in case I have an issue during rebuild: Can I ddrescue a "failed" parity/array drive to a new drive and then insert that new drive in place of the "failed" parity/array drive and ask unraid to use the new drive and trust parity? Like, can I do an Indiana Jones swap of a drive?

 

Thank you!

Link to comment
32 minutes ago, NominallySavvyTechPerson said:

Is there an option for unraid to disallow writing to the array during the rebuild? I have disabled docker and commented out any cron jobs but I want to take all available precautions.

You can do the rebuild in maintenance mode if you want to do this, but the array will be unavailable until the rebuild completes and you restart the array in normal mode..   Having said that Unraid will correctly handle any writes to the array during a rebuild process if you do it with the array started in Normal mode.

 

Link to comment

I have two 8tb drives and I'm trying to rebuild the array but I'm having trouble.

 

I added the two drives to the array and I attached a screenshot of what it looks like after adding them (one.png). There's the two blue boxes next to each new drive.

 

At the bottom it says "Stopped. Replacement disk installed." and "Start will start Parity-Sync and/or Data-Rebuild." I checked the box for "Maintenance mode" and clicked START. I attached a screenshot of what that looks like (two.png).

 

I think something is wrong because the values for READS and WRITES are all 0.0B/s. I expect to see drive activity. I expect to see reads on Parity, Parity 2, and Disk 1 and writes on Disk 2 and Disk 3. I attached a screenshot (three.png).

 

Thanks!

one.png

two.png

three.png

Link to comment

Surprised nobody asked for diagnostics on this thread.

 

We don't know anything about the health of any of your disks. Do any have SMART ( 👎 )warnings on the Dashboard page?

 

Maybe I missed it, but we also don't know if the emulated disks are mountable, so you could be rebuilding unmountable filesystems. And since you are doing it in Maintenance mode, no way to find out since no disks are mounted.

 

Nothing to do now but wait for rebuild to complete, then we can try to check filesystem on the rebuilt disks if necessary.

 

Keep the originals just as they are with their contents, in case they might be useful to recover anything.

 

Also, SMR parity2 shouldn't be as much of a concern since rebuild isn't doing random access.

Link to comment

Thank you.

 

I added two 8TB drives and rebuilt the array in maintenance mode without issue. Then I did parity swap to move an 8TB CMR from parity to the array to replace an 8TB SMR array drive and a new 12TB CMR took its place in parity. Then I replaced an 8TB parity SMR with a new 12TB CMR and I'm waiting for that to finish. I'll run xfs_repair through the gui when that's done. Thanks for mentioning it; I didn't know about that feature.

 

Sorry for not posting diagnostics. I've seen that in other troubleshooting threads but I didn't think it mattered for my question. I'll try to remember to do that by default in the future. I'm not posting it now because I didn't downloaded the diagnostics when I first posted and I have rebooted and changed drives since then. I'm happy to post diagnostics if I still need to.

Here's the SMART from the drive that unRAID disabled:

 

|   # | ATTRIBUTE NAME           |   FLAG | VALUE | WORST | THRESHOLD | TYPE     | UPDATED | FAILED |          RAW VALUE |
|   1 | Raw read error rate      | 0x000f |   068 |   064 |       006 | Pre-fail | Always  | Never  |           71094922 |
|   3 | Spin up time             | 0x0003 |   092 |   092 |       000 | Pre-fail | Always  | Never  |                  0 |
|   4 | Start stop count         | 0x0032 |   099 |   099 |       020 | Old age  | Always  | Never  |               1331 |
|   5 | Reallocated sector count | 0x0033 |   100 |   100 |       010 | Pre-fail | Always  | Never  |                  8 |
|   7 | Seek error rate          | 0x000f |   090 |   061 |       045 | Pre-fail | Always  | Never  |          992855040 |
|   9 | Power on hours           | 0x0032 |   059 |   059 |       000 | Old age  | Always  | Never  | 36679h+06m+49.125s |
|  10 | Spin retry count         | 0x0013 |   100 |   100 |       097 | Pre-fail | Always  | Never  |                  0 |
|  12 | Power cycle count        | 0x0032 |   100 |   100 |       020 | Old age  | Always  | Never  |                449 |
| 183 | Runtime bad block        | 0x0032 |   100 |   100 |       000 | Old age  | Always  | Never  |                  0 |
| 184 | End-to-end error         | 0x0032 |   100 |   100 |       099 | Old age  | Always  | Never  |                  0 |
| 187 | Reported uncorrect       | 0x0032 |   093 |   093 |       000 | Old age  | Always  | Never  |                  7 |
| 188 | Command timeout          | 0x0032 |   100 |   098 |       000 | Old age  | Always  | Never  |              1 1 6 |
| 189 | High fly writes          | 0x003a |   100 |   100 |       000 | Old age  | Always  | Never  |                  0 |
| 190 | Airflow temperature cel  | 0x0022 |   064 |   048 |       040 | Old age  | Always  | Never  | 36 (min/max 35/36) |
| 191 | G-sense error rate       | 0x0032 |   100 |   100 |       000 | Old age  | Always  | Never  |                  0 |
| 192 | Power-off retract count  | 0x0032 |   100 |   100 |       000 | Old age  | Always  | Never  |                339 |
| 193 | Load cycle count         | 0x0032 |   082 |   082 |       000 | Old age  | Always  | Never  |              37333 |
| 194 | Temperature celsius      | 0x0022 |   036 |   052 |       000 | Old age  | Always  | Never  |    36 (0 20 0 0 0) |
| 195 | Hardware ECC recovered   | 0x001a |   079 |   064 |       000 | Old age  | Always  | Never  |           71094922 |
| 197 | Current pending sector   | 0x0012 |   100 |   100 |       000 | Old age  | Always  | Never  |                 24 |
| 198 | Offline uncorrectable    | 0x0010 |   100 |   100 |       000 | Old age  | Offline | Never  |                 24 |
| 199 | UDMA CRC error count     | 0x003e |   200 |   200 |       000 | Old age  | Always  | Never  |                  0 |
| 240 | Head flying hours        | 0x0000 |   100 |   253 |       000 | Old age  | Offline | Never  | 22351h+58m+29.819s |
| 241 | Total lbas written       | 0x0000 |   100 |   253 |       000 | Old age  | Offline | Never  |        68659349241 |
| 242 | Total lbas read          | 0x0000 |   100 |   253 |       000 | Old age  | Offline | Never  |      1595474125720 |

Can you share good resources on understanding fields like "offline uncorrectable"? The best I have found is https://helpful.knobs-dials.com/index.php/Computer_data_storage_-_Reading_SMART_reports.

Link to comment
  • 4 weeks later...

There's a lot of stuff in the diagnostics. I haven't used that feature before and I want to review it more thoroughly before I upload it.

 

Thank you for your support! I was really sweating it but I'm confident now that I have not lost any data. I have an offline copy of my important files and I'm waiting on parts to add some drives to build a raidz-2 as a destination for online backups. I'm going to take advantage of snapshots.

 

I'm also running badblocks -wsv on all my drives to build confidence in the drives that I have. I added some refurb drives to my array in haste and now I can take my time rotate those out of the array so that I can do a destructive run of badblocks on each. I'm also adding two new wd red pro as parity because they recently had a "sale" on the 14tb. I don't want too many refurbs.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...