Jump to content

need help analyzing unstable array


madshi

Recommended Posts

tower-syslog-20210117-1620.zip

 

History:

My array was running fine, but was completely filled to the brim, so I bought 4x 14 TB drives to update dual parity and 2 data drives to 14 TB.

 

Preperations:

1) I pre-cleared all 4 new disks with 3x pre-clear + extended smart self test. All successful, with no complaints and "perfect" smart statistics.

2) Ran a parity check to make sure that the array is "good", while one disk was still pre-clearing. But this errored out:

 

1st problem:

The parity check started fine, but at some point it stopped, and the pre-clearing aborted, too, and the log was all "red" and full of errors. Seems something was frozen and couldn't recover. So I rebooted the server and restarted the parity check. Ran through fine this time, but found something like 270 errors or so, which it didn't find before, probably caused by the prior failure somehow. Ran another parity check afterwards, just to be safe, and it ran through with 0 errors this time. Did another pre-clearing on the new disk, as well, which also ran through fine. So I thought the problem was a freak accident and didn't worry too much about it.

 

parity swap:

Did 2x parity swap without any problems, and with 0 errors reported by unRAID. So just 2 data disks to upgrade and I'm done.

 

1st data disk upgrade/rebuilding:

Ouch. I can't get it to succeed, anymore. The rebuilding process always starts fine, runs for a while, sometimes several hours, but then errors out, aborting with frozen complaints and stuff. I tried 3 times, with 2 different "new drives", just to be safe. See attached logs.

 

Can you see anything useful in the logs? I think the disks are probably all fine. I suspect a controller problem or mainboard problem or power problem or lose SATA cable or something like that. But I don't know what to look for exactly. Maybe some expert here can get some useful clues from the logs?

 

FWIW, I've already ordered a new mainboard + CPU + RAM to replace the rather old hardware. Maybe that helps fixing this up. But I'd feel better if I knew where the problem is coming from exactly.

 

Thanks!

Link to comment

Yes, I tried to upgrade Disk2, using one of the 2 new 14TB disks. The new disk dropped offline after the rebuild failure. Or maybe the rebuild failed due to the disk dropping offline.

 

Then I reboot and tried again to upgrade Disk2, using a different new 14 TB disk, connected to a different SATA port. Exact same problem. Since 2 different disks, connected to 2 different SATA ports, showed the exact same problem, what conclusions can we draw? I'm not sure...

 

I didn't know the SASLP is no longer recommended. I bought it when it still was, a very looong time ago. I'll look for alternatives. Thanks for the tip!

 

I've rebooted the server now and made a new diagnostic, see attachment.

tower-diagnostics-20210118-1505.zip

Edited by madshi
Link to comment
15 minutes ago, madshi said:

connected to 2 different SATA ports, showed the exact same problem

Same SASLP controller?

 

19 minutes ago, madshi said:

I've rebooted the server now and made a new diagnostic, see attachment.

SMART report is still not correctly generated, you can try a different controller, but best bet would be to replace the SASLP with one of the recommended LSI controllers.

 

Link to comment

Yes, I already ordered a Broadcom (LSI) 9207-8i a couple minutes ago...  😀  Will let you know how it goes.

 

Edit: Not sure if the 2 disks were on the same controller. I have about half the SATA ports coming from the mainboard, the other from the SASLP, but I'm not sure which is which.

Edited by madshi
Link to comment
  • 3 weeks later...

JFYI, after upgrading to a new mainboard + CPU + RAM + controller, the array is working beautifully.

 

Not sure what the problem was. Let's assume it was the controller.

 

In any case, thanks for your help, once more. Without your comments I would not have replaced the controller (at least at first), and might still be with an unstable array now.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...