(SOLVED) No GUI, Red X disk, a few problems


stor44

Recommended Posts

Hi there. I’m having some problems and ned some help. But it’s complicated. Hopefully it's simpler than it looks. System specs in my signature.

 

My main problem is I’ve lost the GUI. Any browser eventually fails with “500 Internal Server Error”. It was working yesterday. I can still log in through SSH and access the shares on the network, and my Win10 VM with Blue Iris is still up and running. I don’t have any Dockers running except for the preclear Docker app. I had stopped them all when I was running “unBalance” yesterday to clear off my last 4TB driven in the array so I can remove it eventually.

 

My second problem is “sdh”, my Parity 2 drive suddenly had errors 30 minutes after I started unBalance, then got the Red X beside it. I have another 10TB Seagate on hand ready to replace it, but I need the GUI first.

 

My third problem is “sdk”, one of two SSDs in my cache pool, started showing errors yesterday as well. I have ordered two new WD RED NAS SDDs to replace both.

 

I don’t have a proper diagnostics dump unfortunately, having no GUI. I tried “diagnostics” over SSH, but it just hangs and I can't cancel it. I did manage to grab the syslog using the terminal, attached. I’ve also attached smartctl logs from each drive.

 

Re: the smart logs, “sdm” “sdn” and “sdp” are drives I previously used in unRAID and then DrivePool in Windows. But they have shown problems, so I was just preclearing them as a test. As far as I can tell, the preclear is still running, I’m using the excellent preclear Docker app, and I’m still receiving emails as it continues working.

 

At 8:44 last night, I got these two emails:

 

“Event: Unraid Parity 2 error

Subject: Alert [TOWER] - Parity 2 in error state (disk dsbl)

Description: ST10000DM0004-1ZC101_ZA25ZJG3 (sdh)

Importance: alert”

 

“Event: Unraid array errors

Subject: Warning [TOWER] - array has errors

Description: Array has 1 disk with read errors

Importance: warning

Parity 2 - ST10000DM0004-1ZC101_ZA25ZJG3 (sdh) (errors 1024)”

 

I’ve attached the logs from today, Dec 29th. Also attached the logs from Dec 28th, which I had just happened to grab yesterday morning, in case that helps at all. That was before any drives showed errors. In fact I had just done a Parity check after replacing a 4TB WD Red with a 10TB Seagate earlier this week. No errors during that either.

 

Is there a way to stop the preclear via SSH? Then I could reboot the server ideally with a clean shutdown. I don't mind not finishing the preclears, I'm most concerned about the loss of dual-parity and my cache pool. (Fortunately I have recent copies of my important shares on DrivePool in another Windows 10 PC).

 

Thanks for any ideas, I appreciate your time.

tower-diagnostics-20201228-0738.zip tower-logs-20201229.zip

Edited by stor44
(SOLVED) Check cabling, health of drives, then rebuild to itself
Link to comment

I was finally able to shutdown after my preclearing cycles finished. Now that I've done more forum searches, seems like it was the wrong thing to do, but I couldn't do anything without the GUI.

I swapped in a new 10TB drive to replace the failing Parity 2 drive (another wrong move it seems), and my 2nd SSD cache drive is no longer showing errors, so that's the good news. Also I ran unBalance one more time to completely clear off my last 4TB drive and remove it from the array.

 

After starting a parity check after another reboot, a different drive is showing errors now.

Thanks for any help. I guess I did several steps wrong here.

tower-diagnostics-20210101-1404.zip

Link to comment

Thanks for the reply. Yes, all the array drives are on the same controller, Dell Perc H310. And I have two SATA power cables from the power supply, each with 4 SATA power connectors. So the 2 parity drives and the first 2 array drives are powered off the same cable. Then the 3 remaining array drives and an 8TB unassigned drive are powered off the other cable.

 

Power supply is an interesting thought. Sometimes the computer won't shut down completely...unRAID shuts down cleanly, but then the machine sits there until I hold the power button 5 seconds, or flip the power switch on the power supply itself.

 

Here are new diagnostics after a reboot after checking the SATA and power connections. Thank you.

tower-diagnostics-20210102-0428.zip

Link to comment

Ok we're back to normal here. Posting this in case it helps others (or myself in the future!). I was able to follow the unRAID wiki "Rebuilding a drive onto itself" after I verified the drive was ok. Also sorted out the red X'd Parity 2 drive by stopping the array, unassign that drive from its slot, start the array, then stop the array, re-assign the drive, and start the array and rebuild parity (those aren't the exact steps, see wiki for details).

Thanks for all the help, this forum is incredibly helpful. Happy New Year!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.