(SOLVED) Parity upgrade = lots of problems


stor44

Recommended Posts

Hi there, having some unexpected problems here. Diagnostics attached.

 

I'm trying to upgrade my two parity drives (1 at a time) from 4TB Reds, to 10TB Seagates I scored on Black Friday (got 3 of them for $200 CDN each, BarraCuda Pro's inside).

- I precleared the 10TB drives before shucking, no errors. I'm pretty sure you don't have to preclear parity drives, but I wanted to test them anyway.

- I set unRAID to not start the array automatically

- I shut down unRAID last night, no problems prior to that.

- removed the 4TB 1st Parity drive

- installed the 10TB Seagate after shucking it

- booted up unRAID, it said the Parity 1 drive is missing (correct) so I set that slot to the 10TB drive and started the array and waited for the parity rebuild to begin.

 Now I'm seeing many errors: "Array has 6 disks with errors", "Disk 2 in error state", and now it says every drive has read errors. Also I can't access the shares from my other computers over the network.

- The parity check has paused itself at 4.64GB.

 

I'm using a Dell PERC H310 raid card flashed to IT firmware.

I will double check my cabling, maybe bumped something during install?

Thanks for any help!

tower-diagnostics-20191206-1228.zip

Edited by stor44
Solved by swapping motherboard
Link to comment

It was a controller problem:

Dec  6 06:22:09 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!!

Mmake sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available.

 

As for the disabled disk either rebuild on top if the emulated disk is mounting correctly and data looks fine, or, and assuming nothing was written to it after it got disabled, do a new config and resync parity.

  • Like 1
Link to comment

Thanks Johnnie.

- I shut down unRAID. Checked my cables, removed and reseated the Dell card. That seems to work.

- unRAID boots, now the rest of disks seem fine again, except Disk2 is "unmountable: no file system".

- I clicked Start the array, but it wants to format Disk2 first.

 

Should I format Disk 2, or try a new config like you said? Seems like both roads lead to a parity check.

unraid DEC6 1.png

tower-diagnostics-20191206-1324.zip

Link to comment
Should I format Disk 2

Format is never a an option, unless there's no data on disk2, after a format data can't be recovered from parity, seems to be a common misconception.

 

You could try fixing the filesystem on the emulated disk2 first and if successful rebuild, but since there can be some corruption, and assuming nothing was written to disk2 once it got disabled, doing a new config with IMHO be the way to go.

 

 

Link to comment

Thanks, that makes sense re: format. Nothing new should have been written to Disk2.

- I clicked New Config, use current assignments.

- started the array, and now a parity check is underway. Will report back once it's done in 18 to 20 hours.

 

Thanks again johnnie.black, you've helped me numerous times and I always appreciate it! Couldn't run my server without this helpful community.

Link to comment

Five hours into the parity check, but now I got an alert saying "array has six failed disks.' Diagnostics attached.

I'm at work for another few hours, but I have remote access to the GUI. 
 

The parity check is still running, but instead of taking 18 hours, it’s changed to about 3 hours left, 65% complete. Also it says 4TB total size, shouldn’t that be 10TB?

 

Thanks for any help.

tower-diagnostics-20191206-1936.zip

Edited by stor44
More detail
Link to comment

Thanks, and here's an update. I got home and the parity check said it was 165% complete.

- stopped parity check and shut down the server

- swapped the controller card to a different slot, re-checked cabling

- booted up, did New Config again, keeping assignments

- started parity check. It's doing the full 10TB this time. About 7 hours left now. No dropped drives so far.

Link to comment

Indeed. The card has just failed again today, same error as before:

Dec  9 16:24:23 Tower kernel: mpt2sas_cm0: SAS host is non-operational !!!!

I do have a second H310, but I use it in my Windows PC to host the drives that keep copies of my unRAID shares (running DrivePool on there). Might need to research another host card.

I have one more slot I can try, but yes, starting to seem heat-related or end of the line. I'll check the heatsink on the card too.

 

Thanks for the replies.

Link to comment

Latest update. I was able to swap slots and the host card ran for about 24 hours, but then drives would suddenly disappear from the array.

 

I don't want to mess with my other PC (where my other H310 is working fine for 3 years), so I've bought two more H310's off eBay to test/replace this one. Hopefully they show up next week.

 

In the meantime, what does "stale configuration" mean? I did search for it, but I'm still not sure. Latest logs attached. Thanks for your time.

tower-diagnostics-20191212-1205.zip

Link to comment

Sorry to hijack the thread but i worry about the temps on my HBA, is there anyway to measure the temp?

 

How much better would it be to remove the heatsink and apply some high quality thermal paste and reapply the heatsink?

 

My HBA is LSI SAS9340-8i ServeRAID M1215 , i have not had any issues yet but i worry because i put my finger on the heat sink and it was warm, i do have a fan blowing on it but id say the heatsink it self was at least 50 or even 60c.

Link to comment
1 minute ago, johnnie.black said:

That sounds about right, they do run pretty hot, as long as there's some airflow around them it's fine, they just shouldn't be without any as they are designed to run in servers with good cooling.

Ok thanks, perhaps i read the specifications wrong about the card but according to lenovo it is only operational at temp 5c to 40c which sounds crazy to me, most things will operate long above 40c? Perhaps they are talking about surrounding environment recommended temperatures and not actual card temp?

image.png.2d571f8d61e838381b1b3fe47a7a3176.png

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.