[Solved] Trouble Starting array


Falcowe

Recommended Posts

So I regrettably decided to upgrade to 6.8.0 today. I should have known better it being a Friday the 13th :) lol. In any case I think I did something dumb. After the update downloaded and the server needed a reboot I went ahead and rebooted it via the web GUI, and a few min later (maybe 3-5) came back only to find that nothing had happened. I went an checked on the server in person and also it was still up and running. So I tried shutting down the server directly on the machine via the command line, but again no response, waited another 3-5 min and then tried pressing the power button. Again on response after a couple of mins so I figure since it just needs a reboot I'll just force shut it down and power it back on. Well my mistake because I have been having issues since then. It seems that there is a huge delay in getting unRaid loaded now, and I left the array starting and when I came back 2 hours later the array was still attempting to start. I have tried booting in safe mode, and into maintenance and that was successful, but still had the delay in being able to access the webGUI. My guess is that something with unraid is corrupt since it was working fine earlier today before I messed with it. Thanks in advance for your help. 

 

I should note, I currently have the Unraid webGui up however I haven't started the array seeing as that has been causing issues so far. So I should be able to collect any extra diag data you may need. Also when I try to pull up the logging window in the webGUI it takes quite some time for the information to populate. Although it does eventually. 

 

Attached are the daig zips for when I first caused the issue and the most recent one that I pulled directly from my flash drive and the most recent one I pulled from when the web gui was working. 

 

tower-diagnostics-20191213-1452 Falcowe.zip

Edited by Falcowe
Link to comment

I also just realized that I haven't posted what my system specs etc are. 

I'm using a 45Drives AV15 with a supermicro X11SSH-CTF motherboard a Xeon CPU (1220 v5) and 16GB of memory. 

I am running a number of docker containers, however none of them are loaded at the moment since I haven't started the array. 

Plugins: CA, rclone, Unassigned Devices, User Scripts

Link to comment

Did you just change the name of your server from "Tower" to be "Treadstone"?

 

Can you reboot, Go to 192.168.140.2 (don't use the server name), start the array normally, and then wait a couple of minutes, and then post a set of diagnostics

 

Also, do all your switches etc support Jumbo Frames?

Link to comment

Squid, no I didn't rename the server. The logs were pulled both locally (i.e. Tower) and remotely (i.e. Treadstone).

 

I have started the array again normally, and I will post new diagnostics soon (hopefully).

 

I can't imagine anything in my network doesn't support Jumbo frames. But I am not 100% certain, would this cause the array not to start?

Link to comment

I tried a Temp unRAID install and I was having a great deal of difficulty building a new array. There was a lot of lag in the GUI and when I finally got a couple of disks in place and clicked "Start" array nothing happened. So I then went and did the bios update and the firmware update for the supermicro board. And that hasn't helped anything either. In fact it seems that the BIOS update may have made things worse. Now after booting into the unRAID flash drive and into GUI mode (I have yet to try non-gui mode, but am currently giving GUI no plugins some time to boot up), I am getting a black screen and I can't bring up the web GUI either. Any ideas? 

 

I have not updated the LSI HBA but the fact its not even getting to the GUI anymore makes me think that isn't the issue. I will try to figure out which HBA I have and see if I can update it however I don't have a any PCI card so I suspect its something that is part of the Supermicro board and that it was taken care of with the BIOS and firmware updates.

 

EDIT: The webGUI came back online, I have tried starting the array again, but still waiting 10 min later for the array to start. So I guess its good news, I haven’t made it worse. 

Edited by Falcowe
Link to comment

So the array has been starting over night and is still mounting disks. Not normal behaviour I think. I also noticed that like in previous times the GUI reports high cpu usage and when I look at the Processes report in the GUI there is one task that has 100% CPU usage, could this be causing the disks not to mount? The server is trying to do this in safe mode at the moment.

 

Treadstone51Troubleshooting Proccesses window.JPG

Edited by Falcowe
Link to comment

How about if start array in maintenance mode ( skip mount file system ). Does it start normal ?

 

If you got positive result, pls also perform parity check ( no correction ), this just test system in health or not.

 

BTW, during abnormal disk mounting, does always stuck at same disk ?

Edited by Benson
Link to comment
1 hour ago, Falcowe said:

Sorry what do you mean?

 

The Parity-Check (no correction writing) will be running for about 6 hours.

Sorry, I means problem not as expected ..... filesystem issue only.

 

Does parity check speed normal ? Biggest disks was 3TB ?

Edited by Benson
Link to comment

Ya the Parity-Check speed seems normal from past runs when the array was working properly. estimated time to completion as of this post is 5 hours. The disks are all 6TB disks.

 

So if it is a file-system issue does that mean unRAID is corrupted? Or that the file system of the disks is the issue? If so will a parity check fix the issue or will there likely be something else that needs to be done? Thanks Benson et. al!

Link to comment

6TB parity check need 12hrs, anyway this not important.

 

In your case, parity check won't help you fix the long array start problem. If parity check haven't error or further filesystem check ( in maintenance mode ) also pass, then I believe problem not on disk or disk controller. But I haven't much idea.

Edited by Benson
Link to comment

Pls also type 'dmesg' at console prompt, does any error prompt out.

 

 

I check and compare your syslog, there are abnormal "shcmd" always "exit status:1" and stuck in btrfs device scan. May be you should further check on this.

 

 

Dec 14 13:35:51 Treadstone51 kernel: md1: running, size: 5860522532 blocks
Dec 14 13:35:51 Treadstone51 kernel: md2: running, size: 5860522532 blocks
Dec 14 13:35:51 Treadstone51 kernel: md3: running, size: 5860522532 blocks
Dec 14 13:35:51 Treadstone51 emhttpd: shcmd (73): udevadm settle
Dec 14 13:37:51 Treadstone51 emhttpd: shcmd (73): exit status: 1

Dec 14 13:37:51 Treadstone51 root: Starting diskload
Dec 14 13:37:51 Treadstone51 emhttpd: Mounting disks...
Dec 14 13:37:51 Treadstone51 emhttpd: shcmd (80): /sbin/btrfs device scan

 

 

Normal example :

Nov 13 11:26:56 Z390 kernel: md8: running, size: 7814026532 blocks
Nov 13 11:26:56 Z390 kernel: md9: running, size: 7814026532 blocks
Nov 13 11:26:56 Z390 kernel: md10: running, size: 7814026532 blocks
Nov 13 11:26:56 Z390 kernel: md11: running, size: 7814026532 blocks
Nov 13 11:26:56 Z390 kernel: md12: running, size: 7814026532 blocks
Nov 13 11:26:56 Z390 emhttpd: shcmd (27): udevadm settle
Nov 13 11:26:56 Z390 emhttpd: Autostart enabled
Nov 13 11:26:56 Z390 root: Starting diskload

Nov 13 11:26:58 Z390 emhttpd: Mounting disks...
Nov 13 11:26:58 Z390 emhttpd: shcmd (40): /sbin/btrfs device scan
Nov 13 11:26:59 Z390 root: Scanning for Btrfs filesystems

Edited by Benson
Link to comment

If "udevadm settle" is right cue for troubleshoot, then it seems some hardware device init and cause stuck during boot up.

 

I test "udevadm settle" and "btrfs device scan" at terminal prompt, all return to prompt immediate.

 

Could you make test once "udevadm settle" won't return "exit status: 1" then array would start / stop in normal ? ( Pls try start stop several times to confirm )

 

After that, you may shoot what hardware device cause problem, i.e unplug some hardware device or disable onboard device in BIOS.

 

 

 

wt.PNG.3dc576233509ac466ca395a8f9cf324e.PNG

 

Edited by Benson
Link to comment

Hey Benson, so both of those prompts aren't returning immediately like they seem to in your screenshot. I think the next step for me to try is to move slots, however I can't see a reason why that would change anything because unRAID is detecting the disks, it just doesn't seem to be able to start the array. And it went from a working array to a non-functioning array without me physically touching the machine. 

Link to comment

Solved: It was the optical drive. More particularly the optical disk in the drive. So strange. I pulled the disk out from the drive and the array immediately started. Same when I unplugged the optical drive. But the array also starts with the optical drive plugged in, with no disk in it. Also my install of 6.8.0 was not the issue even with the ungraceful shutdown. I was able to recopy everything back onto the boot flash and now have everything precisely the way it was before this weirdness.

 

Please mark as SOLVED I can't seem to change the post header.

Edited by Falcowe
SOLVED
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.