Parity drives keeps going disabled


markc99

Recommended Posts

I'm having errors on my parity drive, and it keeps going disabled. I've tried resetting it, and running through another parity check, and its not helping...

 

I have at least two data drives which aren't being used yet, and I thought I would be able to shrink the array to pull a data drive out to be able to replace the parity drive, but I can't figure out how to do it. I've googled around, and not able to find the right answer. Could someone point me in the right direction?

 

I'm running Unraid 6.11.5.

 

Mark

Link to comment

So, I was able to build a new config, remove a data drive, and replace the erroring Parity disk with it. It got probably 8 hours through a rebuild, and now the NEW parity drive is doing the same thing. The new data drive was already pre-cleared, and only has a month or two of service so far.

 

I'm perplexed as to what to do next.

Link to comment
6 hours ago, DanW said:

Have you changed the SATA data cable? Could be a bad sata cable or something wrong with the SATA port on the motherboard. Especially if a new drive dropped in its place does the same thing, it's unlikely you have two bad drives in a row.

I thought the same thing, but when I did the disk swap in the GUI, I didn't open the case. The original "bad" disk is still connected to the same cable it was before, and the new "bad" disk is on the cable which wasn't having problems before...

 

The server currently has my parity disk disabled and is running a "Read-Check". Based on the percentage done, it's probably going to take another day or two.

Edited by markc99
Added more info.
Link to comment
22 minutes ago, trurl said:

No point in letting that continue. Fix the problems so you can rebuild parity

Understood. I stopped everything, powered the server off, checked the cables again, and the parity is rebuilding. Finger crossed... Only 1 day and 19 hours to get through without issue. ;)

Link to comment

Well, I thought I was in the clear, but I have errors on two disks again. One is brand new, and the other is a bit older. I'm thinking the first step may be to replace the SATA breakout cable. I haven't used any SATA connections on my motherboard however, so maybe I switch them to there to make sure it's the cable?

 

Thoughts?

Edited by markc99
Added more info.
Link to comment
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [crit] 4264#4264: ngx_slab_alloc() failed: no memory
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: shpool alloc failed
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: nchan: Out of shared memory while allocating message of size 10638. Increase nchan_max_reserved_memory.
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: *333417 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost"
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: MEMSTORE:00: can't create shared message for channel /devices

Don't know what these are all about but they are filling your syslog

Link to comment
21 minutes ago, trurl said:
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [crit] 4264#4264: ngx_slab_alloc() failed: no memory
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: shpool alloc failed
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: nchan: Out of shared memory while allocating message of size 10638. Increase nchan_max_reserved_memory.
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: *333417 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost"
Jan 24 22:54:21 Tower nginx: 2023/01/24 22:54:21 [error] 4264#4264: MEMSTORE:00: can't create shared message for channel /devices

Don't know what these are all about but they are filling your syslog

I'm not sure what they are either... I did power off the server and put the two disks experiencing errors on SATA cables attached to the mobo, instead of the SATA card, and it's running a parity check. The interesting thing is it's running WAY faster. It's been running a half an hour, and it's already 10% complete, with 5 or so hours left... The last one that ended up erroring out, was estimated to take almost 2 days. My gut is thinking one of my breakout cables may be going bad.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.