RoachBot Posted March 18, 2023 Share Posted March 18, 2023 I had 2 drives disabled at the same time by Unraid due to read errors. I successfully rebuilt the array with used 3TB drives I purchased off eBay. Then a day or two later, both drives were disabled by Unraid. This made me think the problem isn't necessarily the drives and might be a cable or controller. I ran successful SMART tests on the original disabled drives and successfully precleared them using a different port. After replacing the cable and checking all connections, I tried to rebuild once more. Unfortunately, it finished but with 128 read errors on Parity 2. Parity 2 is on the same controller/port combination as the problem drives, but using the new cable. I've attached 2 diagnostics: Before 2nd Rebuild (with 2 disabled drives and read/write errors) After 2nd Rebuild (with 128 read errors) Any advice on how I should proceed? Thanks! obelisk-diagnostics-20230315-1832.zip obelisk-diagnostics-20230317-2310.zip Quote Link to comment
JorgeB Posted March 19, 2023 Share Posted March 19, 2023 Interesting that both disks have a failed SMART test when they were new, but they have a good test after that, but it's only the short, it doesn't look like a disk problem but run a long test on both. Also do you still have old disk7 intact? Quote Link to comment
RoachBot Posted March 20, 2023 Author Share Posted March 20, 2023 (edited) 14 hours ago, JorgeB said: Also do you still have old disk7 intact? I have disk 3 (YHKZ3J6D) intact from before Rebuild 1. It was disabled for read errors but is potentially missing any data written to the array between Rebuild 1 and Rebuild 2. Disk 7 (YVKU27RK) is not intact. It was precleared, re-inserted for Rebuild 2 (which ultimately had errors), and is currently in the array. I have disk 3 (P9GHWUKW) and disk 7 (P9GHWK6W) intact from before Rebuild 2. These are the used drives I acquired recently. They were part of the successful Rebuild 1 and potentially had new data written to them during normal array operation. They were disabled after having read errors as well as what looks like write errors. I'll run the extended smart tests you mentioned and report back. Thanks! Edited March 20, 2023 by RoachBot Add quote Quote Link to comment
RoachBot Posted March 21, 2023 Author Share Posted March 21, 2023 On 3/19/2023 at 3:20 AM, JorgeB said: Interesting that both disks have a failed SMART test when they were new, but they have a good test after that, but it's only the short, it doesn't look like a disk problem but run a long test on both. I ran an extended test on the drives I purchased recently and they both passed. Attached are the SMART reports if that's useful. I found this thread while researching the SMART error. Perhaps it's related? HUS724030ALS640_P9GHWK6W_35000cca0581ce404-20230320-1142.txt HUS724030ALS640_P9GHWUKW_35000cca0581ce810-20230320-1957.txt Quote Link to comment
RoachBot Posted March 21, 2023 Author Share Posted March 21, 2023 (edited) Can any of the intact disks be used to correct the 128 errors during rebuild? YHKZ3J6D - disk 3 - missing any writes that occurred between Rebuild 1 and Rebuild 2 P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled Edited March 21, 2023 by RoachBot Quote Link to comment
JorgeB Posted March 21, 2023 Share Posted March 21, 2023 Tests passed so disks are OK, you can use the previous good disk and re-sync parity, you would only need disk7, disk3 was rebuilt correctly. Quote Link to comment
RoachBot Posted March 21, 2023 Author Share Posted March 21, 2023 8 hours ago, JorgeB said: Tests passed so disks are OK, you can use the previous good disk and re-sync parity, you would only need disk7, disk3 was rebuilt correctly. If you don't mind, how are you able to tell disk3 was rebuilt successfully and disk 7 wasn't? Quote Link to comment
JorgeB Posted March 21, 2023 Share Posted March 21, 2023 On second though both would not rebuild correctly, so probably best to use these and re-sync parity: P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled But keep the other rebuilt disks intact for now. Quote Link to comment
RoachBot Posted March 23, 2023 Author Share Posted March 23, 2023 I did another rebuild with the disks you mentioned and there were 101 read errors. The problem sectors were the same. obelisk-diagnostics-20230323-1137.zip Quote Link to comment
JorgeB Posted March 23, 2023 Share Posted March 23, 2023 Wasn't the plan to re-sync parity with the old disks? Quote Link to comment
RoachBot Posted March 23, 2023 Author Share Posted March 23, 2023 On 3/21/2023 at 11:48 AM, JorgeB said: On second though both would not rebuild correctly, so probably best to use these and re-sync parity: P9GHWUKW - disk 3 - not missing writes between Rebuilds, but it had write errors (?) before being disabled P9GHWK6W - disk 7 - not missing writes between Rebuilds, but it had write errors (?) before being disabled I re-synced with P9GHWUKW and P9GHWK6W as you mentioned here. Quote Link to comment
RoachBot Posted March 23, 2023 Author Share Posted March 23, 2023 (edited) Perhaps I'm confused. Whenever I rebuild, the Unraid UI has a"Sync" button so I thought re-sync parity and rebuild are synonymous. Did you mean rebuild the parity drive using the data drives? Edited March 23, 2023 by RoachBot Quote Link to comment
JorgeB Posted March 24, 2023 Share Posted March 24, 2023 Yes, resync parity using the known good rebuilt drives, when we say sync or resync it's about parity, rebuild is about data drives. Quote Link to comment
RoachBot Posted March 24, 2023 Author Share Posted March 24, 2023 (edited) Does Unraid report ALL read errors that occurred during rebuild? As in "Can I rely on the 128 being the only sectors with read errors"? If yes, before rebuilding I used this comment to figure out which files were in the 128 sectors for disk3 and disk7 and copied the files (from P9GHWUKW and P9GHWK6W) . There was only one file affected. Edited March 24, 2023 by RoachBot Quote Link to comment
JorgeB Posted March 24, 2023 Share Posted March 24, 2023 7 minutes ago, RoachBot said: As in "Can I rely on the 128 being the only sectors with read errors"? If you are looking at the number of errors on the GUI yes. Quote Link to comment
RoachBot Posted March 24, 2023 Author Share Posted March 24, 2023 (edited) Yes. Then I used the syslog to get the sector numbers and confirmed there were 128. Is it okay to do the following or do you suggest an alternative? Start array normally. Delete the corrupt file. Copy over the intact file. Re-sync Parity 2 using the data drives. Edited March 24, 2023 by RoachBot Quote Link to comment
JorgeB Posted March 25, 2023 Share Posted March 25, 2023 I think that's a good plan. 1 Quote Link to comment
RoachBot Posted March 29, 2023 Author Share Posted March 29, 2023 I replaced the corrupt file and successfully rebuilt Parity 2. So far, it's running without issue. I'm still not sure why 2 drives were disabled at the same time and then again with 2 different drives. A new cable connected to HBA controller didn't fix it, but maybe I had too many drives attached to one cable on the power supply. I replaced that particular splitter and distributed power more evenly. Additionally, I discovered my Parity 2 drive had Type 1 Protection enabled and this probably contributed to read errors during rebuild. I'm not sure if that had an impact on 2 drives getting disabled simultaneously. Anyways, I'll try to update if the problem persists. Thanks @JorgeB for all your help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.