Jump to content

Disabled Drive during Parity Check - How to Proceed?


bnevets27

Recommended Posts

Short version. I'm 35% into a parity check and a drive got 745 read errors all of a sudden. It of course got disabled. So now I have a disabled disk and about 22 hrs left on the parity check. I do have a spare. Should I shut down and replace or let it finish first?

 

Long version

I had a power failure about a day ago. The array shutdown cleanly via an ups

After I turned it back on, I decided to run a parity check as it had been awhile since it was last done.

During the parity check the server hung, I tried to let it run its course but alas I had to do a hard reboot/unclean shutdown

After booting I had to do a parity check due to the unclean shutdown. 

This is when my drive 14 got disabled.

 

I am running dual parity. And as I mentioned, I have a spare already in the machine ready to go.

Link to comment

I know this is not going to sound like the right tone in text so please understand I don't mean this in an ungrateful way.

Why is every question asked always responded to with posting a diagnostic log? I do get that is helpful in sorting out a problem but. 1) before they existed questions were answered without them and 2) simple general questions shouldn't need that much detail.

I'm a bit uneasy with posting my diags as when I've done so previously. Some of my information was taken from it and posted. Nothing really sensitive but stuff I wouldn't post myself. And had no control about removing it.

I also find now, when looking for information in the forms, simple questions seem to go off the rails when other things are discovered in the diag logs leaving it hard to know the solution to common problem.

So I guess my simple question is. If a drive gets disabled during a parity check (when Parity is already valid, parity has a green ball), what is the best practice? Do you cancel the check and then replace the drive or let the check finish then replace and rebuild?

Link to comment
21 minutes ago, bnevets27 said:

what is the best practice? Do you cancel the check and then replace the drive or let the check finish then replace and rebuild?

Depends on what caused the error, hence why I asked for the diagnostics, without them I don't have any advice on what to do, since whenever possible I like to avoid guessing and give bad advice, so good luck.

Link to comment

Well I am going to pull and replace the drive anway... But there could be other problems I suppose as I've now loss the webui when going to grab the diags. Lost telnet also. I'm going to let it sit overnight an hope for a miracle in the morning. Not sure what else to do. 
 

Link to comment

You mean during rebuild, and grab diag. cause system hang ?

 

I don't think have miracle in morning. Restart it and grab diag. before rebuild.

 

Take a small step, If restart and all disk (include emulated) show mountable and sample file read fine, you should haven't big trouble.

 

 

Link to comment

Well is looks like the webui is hung. "504 Gateway Time-out"

 

I've tried to get diags via telnet as I regained access to it this morning. But its hung/stuck at "starting diagnostics collection..." 

My dockers are still running and file are all accessible. Tried to copy the syslog to flash drive also hangs. But I did get some lines out of the syslog

Aug 16 23:39:33 Excelsior nginx: 2018/08/16 23:39:33 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"
Aug 16 23:39:44 Excelsior nginx: 2018/08/16 23:39:44 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /Main HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net"
Aug 16 23:41:46 Excelsior nginx: 2018/08/16 23:41:46 [error] 10744#10744: *431150 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"
Aug 16 23:54:08 Excelsior nginx: 2018/08/16 23:54:08 [error] 10744#10744: *432122 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /Main HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net"
Aug 16 23:56:09 Excelsior nginx: 2018/08/16 23:56:09 [error] 10744#10744: *432122 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.106, server: , request: "GET /favicon.ico HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net", referrer: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.unraid.net/Main"

 

Side note, I was trying to find out a generic answer to the my initial question. I do like to be self sufficient and I do have others with unraid so if this situation ever comes up again it would be nice to know the best route to follow. Prime example, if canceling the parity check was an ok/right thing to do I could be rebuilding my array currently.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...