Disabled Drive during Parity Check - How to Proceed?


Recommended Posts

Short version. I'm 35% into a parity check and a drive got 745 read errors all of a sudden. It of course got disabled. So now I have a disabled disk and about 22 hrs left on the parity check. I do have a spare. Should I shut down and replace or let it finish first?

 

Long version

I had a power failure about a day ago. The array shutdown cleanly via an ups

After I turned it back on, I decided to run a parity check as it had been awhile since it was last done.

During the parity check the server hung, I tried to let it run its course but alas I had to do a hard reboot/unclean shutdown

After booting I had to do a parity check due to the unclean shutdown. 

This is when my drive 14 got disabled.

 

I am running dual parity. And as I mentioned, I have a spare already in the machine ready to go.

Edited by bnevets27
Link to comment

I know this is not going to sound like the right tone in text so please understand I don't mean this in an ungrateful way.

Why is every question asked always responded to with posting a diagnostic log? I do get that is helpful in sorting out a problem but. 1) before they existed questions were answered without them and 2) simple general questions shouldn't need that much detail.

I'm a bit uneasy with posting my diags as when I've done so previously. Some of my information was taken from it and posted. Nothing really sensitive but stuff I wouldn't post myself. And had no control about removing it.

I also find now, when looking for information in the forms, simple questions seem to go off the rails when other things are discovered in the diag logs leaving it hard to know the solution to common problem.

So I guess my simple question is. If a drive gets disabled during a parity check (when Parity is already valid, parity has a green ball), what is the best practice? Do you cancel the check and then replace the drive or let the check finish then replace and rebuild?

Link to comment
21 minutes ago, bnevets27 said:

what is the best practice? Do you cancel the check and then replace the drive or let the check finish then replace and rebuild?

Depends on what caused the error, hence why I asked for the diagnostics, without them I don't have any advice on what to do, since whenever possible I like to avoid guessing and give bad advice, so good luck.

Link to comment

You mean during rebuild, and grab diag. cause system hang ?

 

I don't think have miracle in morning. Restart it and grab diag. before rebuild.

 

Take a small step, If restart and all disk (include emulated) show mountable and sample file read fine, you should haven't big trouble.

 

 

Edited by Benson
Link to comment

Well is looks like the webui is hung. "504 Gateway Time-out"

 

I've tried to get diags via telnet as I regained access to it this morning. But its hung/stuck at "starting diagnostics collection..." 

My dockers are still running and file are all accessible. Tried to copy the syslog to flash drive also hangs. But I did get some lines out of the syslog

Aug 16 23:39:33 Excelsior nginx: 2018/08/16 23:39:33 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"
Aug 16 23:39:44 Excelsior nginx: 2018/08/16 23:39:44 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /Main HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net"
Aug 16 23:41:46 Excelsior nginx: 2018/08/16 23:41:46 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"
Aug 16 23:54:08 Excelsior nginx: 2018/08/16 23:54:08 [error] 10744#10744: *432122 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /Main HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net"
Aug 16 23:56:09 Excelsior nginx: 2018/08/16 23:56:09 [error] 10744#10744: *432122 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"

 

Side note, I was trying to find out a generic answer to the my initial question. I do like to be self sufficient and I do have others with unraid so if this situation ever comes up again it would be nice to know the best route to follow. Prime example, if canceling the parity check was an ok/right thing to do I could be rebuilding my array currently.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.