Advice after rebuild with errors


Recommended Posts

I had 2 drives disabled at the same time by Unraid due to read errors. I successfully rebuilt the array with used 3TB drives I purchased off eBay. Then a day or two later, both drives were disabled by Unraid.

 

This made me think the problem isn't necessarily the drives and might be a cable or controller. I ran successful SMART tests on the original disabled drives and successfully precleared them using a different port. After replacing the cable and checking all connections, I tried to rebuild once more.

 

Unfortunately, it finished but with 128 read errors on Parity 2. Parity 2 is on the same controller/port combination as the problem drives, but using the new cable. 

 

I've attached 2 diagnostics:

  • Before 2nd Rebuild (with 2 disabled drives and read/write errors)
  • After 2nd Rebuild (with 128 read errors)

 

Any advice on how I should proceed? Thanks!

obelisk-diagnostics-20230315-1832.zip obelisk-diagnostics-20230317-2310.zip

Link to comment

 

14 hours ago, JorgeB said:

Also do you still have old disk7 intact?

 

I have disk 3 (YHKZ3J6D) intact from before Rebuild 1. It was disabled for read errors but is potentially missing any data written to the array between Rebuild 1 and Rebuild 2. Disk 7 (YVKU27RK) is not intact. It was precleared, re-inserted for Rebuild 2 (which ultimately had errors), and is currently in the array.

 

I have disk 3 (P9GHWUKW) and disk 7 (P9GHWK6W) intact from before Rebuild 2. These are the used drives I acquired recently. They were part of the successful Rebuild 1 and potentially had new data written to them during normal array operation. They were disabled after having read errors as well as what looks like write errors.

 

I'll run the extended smart tests you mentioned and report back. Thanks!

Edited by RoachBot
Add quote
Link to comment
On 3/19/2023 at 3:20 AM, JorgeB said:

Interesting that both disks have a failed SMART test when they were new, but they have a good test after that, but it's only the short, it doesn't look like a disk problem but run a long test on both.

 

I ran an extended test on the drives I purchased recently and they both passed. Attached are the SMART reports if that's useful.

 

I found this thread while researching the SMART error. Perhaps it's related?

HUS724030ALS640_P9GHWK6W_35000cca0581ce404-20230320-1142.txt HUS724030ALS640_P9GHWUKW_35000cca0581ce810-20230320-1957.txt

Link to comment

Can any of the intact disks be used to correct the 128 errors during rebuild?

 

  • YHKZ3J6D - disk 3 - missing any writes that occurred between Rebuild 1 and Rebuild 2
  • P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled
  • P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled
Edited by RoachBot
Link to comment

On second though both would not rebuild correctly, so probably best to use these and re-sync parity:

 

P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled

P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled

 

But keep the other rebuilt disks intact for now.

 

Link to comment
On 3/21/2023 at 11:48 AM, JorgeB said:

On second though both would not rebuild correctly, so probably best to use these and re-sync parity:

 

P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled

P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled

 

I re-synced with P9GHWUKW and P9GHWK6W as you mentioned here.

Link to comment

Does Unraid report ALL read errors that occurred during rebuild? As in "Can I rely on the 128 being the only sectors with read errors"?

 

If yes, before rebuilding I used this comment to figure out which files were in the 128 sectors for disk3 and disk7 and copied the files (from P9GHWUKW and P9GHWK6W) . There was only one file affected.

Edited by RoachBot
Link to comment

Yes. Then I used the syslog to get the sector numbers and confirmed there were 128. Is it okay to do the following or do you suggest an alternative?

 

  1. Start array normally.
  2. Delete the corrupt file.
  3. Copy over the intact file.
  4. Re-sync Parity 2 using the data drives.
Edited by RoachBot
Link to comment

I replaced the corrupt file and successfully rebuilt Parity 2. So far, it's running without issue. I'm still not sure why 2 drives were disabled at the same time and then again with 2 different drives. A new cable connected to HBA controller didn't fix it, but maybe I had too many drives attached to one cable on the power supply. I replaced that particular splitter and distributed power more evenly.

 

Additionally, I discovered my Parity 2 drive had Type 1 Protection enabled and this probably contributed to read errors during rebuild. I'm not sure if that had an impact on 2 drives getting disabled simultaneously. Anyways, I'll try to update if the problem persists.

 

Thanks @JorgeB for all your help!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.