madshi Posted January 17, 2021 Share Posted January 17, 2021 tower-syslog-20210117-1620.zip History: My array was running fine, but was completely filled to the brim, so I bought 4x 14 TB drives to update dual parity and 2 data drives to 14 TB. Preperations: 1) I pre-cleared all 4 new disks with 3x pre-clear + extended smart self test. All successful, with no complaints and "perfect" smart statistics. 2) Ran a parity check to make sure that the array is "good", while one disk was still pre-clearing. But this errored out: 1st problem: The parity check started fine, but at some point it stopped, and the pre-clearing aborted, too, and the log was all "red" and full of errors. Seems something was frozen and couldn't recover. So I rebooted the server and restarted the parity check. Ran through fine this time, but found something like 270 errors or so, which it didn't find before, probably caused by the prior failure somehow. Ran another parity check afterwards, just to be safe, and it ran through with 0 errors this time. Did another pre-clearing on the new disk, as well, which also ran through fine. So I thought the problem was a freak accident and didn't worry too much about it. parity swap: Did 2x parity swap without any problems, and with 0 errors reported by unRAID. So just 2 data disks to upgrade and I'm done. 1st data disk upgrade/rebuilding: Ouch. I can't get it to succeed, anymore. The rebuilding process always starts fine, runs for a while, sometimes several hours, but then errors out, aborting with frozen complaints and stuff. I tried 3 times, with 2 different "new drives", just to be safe. See attached logs. Can you see anything useful in the logs? I think the disks are probably all fine. I suspect a controller problem or mainboard problem or power problem or lose SATA cable or something like that. But I don't know what to look for exactly. Maybe some expert here can get some useful clues from the logs? FWIW, I've already ordered a new mainboard + CPU + RAM to replace the rather old hardware. Maybe that helps fixing this up. But I'd feel better if I knew where the problem is coming from exactly. Thanks! Quote Link to comment
JorgeB Posted January 18, 2021 Share Posted January 18, 2021 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
madshi Posted January 18, 2021 Author Share Posted January 18, 2021 See attachment. Thanks for looking into this! tower-diagnostics-20210118-1243.zip Quote Link to comment
JorgeB Posted January 18, 2021 Share Posted January 18, 2021 Disk2 dropped offline so there's no SMART, check connections and post new diags, also note that you're using a SASLP, those are not recommend for a long time due to several issues, including dropping disks without a reason. Quote Link to comment
madshi Posted January 18, 2021 Author Share Posted January 18, 2021 (edited) Yes, I tried to upgrade Disk2, using one of the 2 new 14TB disks. The new disk dropped offline after the rebuild failure. Or maybe the rebuild failed due to the disk dropping offline. Then I reboot and tried again to upgrade Disk2, using a different new 14 TB disk, connected to a different SATA port. Exact same problem. Since 2 different disks, connected to 2 different SATA ports, showed the exact same problem, what conclusions can we draw? I'm not sure... I didn't know the SASLP is no longer recommended. I bought it when it still was, a very looong time ago. I'll look for alternatives. Thanks for the tip! I've rebooted the server now and made a new diagnostic, see attachment. tower-diagnostics-20210118-1505.zip Edited January 18, 2021 by madshi Quote Link to comment
JorgeB Posted January 18, 2021 Share Posted January 18, 2021 15 minutes ago, madshi said: connected to 2 different SATA ports, showed the exact same problem Same SASLP controller? 19 minutes ago, madshi said: I've rebooted the server now and made a new diagnostic, see attachment. SMART report is still not correctly generated, you can try a different controller, but best bet would be to replace the SASLP with one of the recommended LSI controllers. Quote Link to comment
madshi Posted January 18, 2021 Author Share Posted January 18, 2021 (edited) Yes, I already ordered a Broadcom (LSI) 9207-8i a couple minutes ago... 😀 Will let you know how it goes. Edit: Not sure if the 2 disks were on the same controller. I have about half the SATA ports coming from the mainboard, the other from the SASLP, but I'm not sure which is which. Edited January 18, 2021 by madshi Quote Link to comment
madshi Posted February 4, 2021 Author Share Posted February 4, 2021 JFYI, after upgrading to a new mainboard + CPU + RAM + controller, the array is working beautifully. Not sure what the problem was. Let's assume it was the controller. In any case, thanks for your help, once more. Without your comments I would not have replaced the controller (at least at first), and might still be with an unstable array now. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.