Can't get my array right


Recommended Posts

Hi everyone,

 

It's been a month now that I have been trying the most basic thing, get my array setup. Haven't even reached creating shares because of this and I'm getting very disappointed in the process.

Sometimes it's the parity disk that doesn't activate, other times it's a drive that gets unassigned, it's simply horrible. I've attached screenshots and diagnostic file.

 

I have so far:

- reset the config twice, with reboots in between.

- ran smart tests multiple times,sometimes all passed, sometimes it got stuck at 10% and had to reboot

- ran preclear, got it to run successfully on almost all drives, but it also crashes midway and at 40h per drive I gave up

- formatted, rebooted, rebuilt the array, rebooted, started the array, never any luck.

 

I have nothing yet on my disks, so I'm not loosing anytime except a lot of time, but this is getting very frustrating.

 

The problem is, with the smart tests failing to run, I can't even tell if I have a bad disk. The thing is, the disks came from a QNAP that was running fine, so I wasn't expecting so much trouble. Maybe the next thing I can think of, is removing the disks and running a smart test on another computer?

 

 

Screenshot 2024-03-17 at 9.30.54 AM.png

Screenshot 2024-03-17 at 8.56.32 AM.png

unraid-diagnostics-20240317-0932.zip

Link to comment
1 hour ago, JorgeB said:

Lots of these:

 

Mar 17 06:23:44 Unraid kernel: sd 4:0:1:0: Power-on or device reset occurred

 

These usually mean a power/connection issue, check/replace cables/PSU.

 

That's my manual intervention when everything got stuck during a pre-clear, PSUs are fine

 

2 hours ago, ConnerVT said:

Gut feeling it is related to your LSI SAS3008 controller.  But I'll let the experts with these see if they agree and offer advice.

 

iDRAC says everything is fine disk wise, any idea on what to check?

 

5 hours ago, Frank1940 said:

I would start by test RAM.  (There  is a boot option to check non-ECC RAM in Unraid's Boot Menu.) 

 

Not sure how my RAM could be linked to this, I indeed skipped the MEM check, having 128Gb of RAM and fearing the 5 days of checks while still figuring out how to get things running

 

I started hot swapping disks to get smart tests separately on another computer, identified 2 with READ errors, but the other 4 are fine. Unraid didn't have an issue with the 2 errored disks. The parity disk and the 9VV didn't raise a smart error...

Link to comment
9 minutes ago, Jenfil said:

That's my manual intervention when everything got stuck during a pre-clear, PSUs are fine

There are a lot of these, during at least 30 minutes, including just before the disk errors, were you intervening during all that time? And what exactly do mean by intervening? Were you unplugging cables with the server on while reading/writing to the disks? That will cause errors for sure.

Link to comment
2 hours ago, JorgeB said:

There are a lot of these, during at least 30 minutes, including just before the disk errors, were you intervening during all that time? And what exactly do mean by intervening? Were you unplugging cables with the server on while reading/writing to the disks? That will cause errors for sure.

 

Apologies, just noticed those entries, no that wasn't me. I've checked the server logs on lifecycle, it doesn't show anything of the sort. Last hard reset was on March 5th, there are 2 redundant power supplies, no way anything shut down without me doing something. Looks like you found something else that's pretty odd

 

Screenshot 2024-03-17 at 8.51.29 PM.png

Link to comment
On 3/18/2024 at 5:33 AM, JorgeB said:

Could be just a SATA cable issue, since both disks are sowing UDMA CRC errors, especially parity.

 

Switched the drives to other bays no difference.

 

I did a smart test on all drives separately, the 74CC threw a read error, nothing on the others

 

Started over with a new config ,and it's completely different and completely random, look at the screenshot.

 

I can't format the disks 1 and 4, disk 2 keeps erroring out.

 

 

Screenshot 2024-03-19 at 10.15.38 PM.png

Link to comment

Still a lot of errors with the devices, in my experience this is bad SATA or power connection to the disk(s), could also be a PSU or controller issue, but much less likely, is this a server with trays or are the disks connected directly with minSAS cables?

Link to comment
13 hours ago, JorgeB said:

Still a lot of errors with the devices, in my experience this is bad SATA or power connection to the disk(s), could also be a PSU or controller issue, but much less likely, is this a server with trays or are the disks connected directly with minSAS cables?

Yes it's a dell r730xd with 12 trays. a backplane issue you tihnk?

 

Quote

ave you tried reseating controller? Is it overheating?

Temps don't pass 27C

Link to comment

Well tried with Truenas, which gave me smart errors on 2 disks, so got rid of those. Then swapped the backplane board and I don't have the power errors anymore. Now all that's left are those I/O errors, so I'll try the PERC cable once it arrives.

There is light at the end of the tunnel!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.