Starting array stuck


Recommended Posts

Hi all,

 

Out of the blue I got a notification that there was something wrong with my cache (stupid that I forgot to take a screenshot).

I looked at shares and there were no failures. All my dockers stopped! I saw on Main that logs were 100%. 

A forum said, just reboot you're system. I did a reboot but my NAS did not power off. I dove into my putty SSH and got it turned off.

I Rebooted and tried to start my array but it got stuck, nothing helped.. Again I pressed shut down, and nothing happened (well, unraid stopped, but not my NAS).

I rebooted it again via putty and tried starting maintenance mode which worked, now I am doing a parity check but it takes 8 hours.

Maybe somebody can already help me with my diagnostics.zip which I was able to run during the parity check (luckily).

 

For some reason there were more in the flash drive within this hour of problems!!

 

Can somebody check what is going on? The problem started with a notification of my cache SSD (which is hardly a month old).

 

Thanks!!

tower-diagnostics-20191121-1510.zip tower-diagnostics-20191121-1525.zip tower-diagnostics-20191121-1539.zip tower-diagnostics-20191121-1553.zip

Link to comment

NVMe device dropped offline:

 

Nov 21 15:03:30 Tower kernel: nvme nvme0: I/O 399 QID 4 timeout, aborting
Nov 21 15:03:30 Tower kernel: nvme nvme0: Abort status: 0x0
Nov 21 15:03:40 Tower kernel: nvme nvme0: I/O 821 QID 2 timeout, aborting
Nov 21 15:03:40 Tower kernel: nvme nvme0: Abort status: 0x0
Nov 21 15:03:47 Tower kernel: nvme nvme0: I/O 822 QID 2 timeout, aborting
Nov 21 15:03:47 Tower kernel: nvme nvme0: Abort status: 0x0
Nov 21 15:04:00 Tower kernel: nvme nvme0: I/O 399 QID 4 timeout, reset controller
Nov 21 15:04:53 Tower kernel: nvme nvme0: I/O 10 QID 0 timeout, reset controller
Nov 21 15:05:22 Tower kernel: nvme nvme0: Device not ready; aborting reset

 

Link to comment

I can imagine it is still normal while the system was not completely shut down (but for some reason, the system was not able to do so)! 

But for some reason, the starting array will not complete! Some extra information:

- plugins loaded

- shares loaded

- dockers kept loading but showed nothing.

 

A few (i guess) hours i updated some dockers..

 

Should I terminate the parity check?

Or should I uncheck the SSD and try to restart the array..

 

Link to comment

Did another diagnostics and the end says something about dockerd:
 

Nov 21 17:30:52 Tower dhcpcd[1675]: br0: failed to renew DHCP, rebinding
Nov 21 17:34:23 Tower avahi-daemon[6330]: Joining mDNS multicast group on interface br-8c356d0b587c.IPv4 with address 172.18.0.1.
Nov 21 17:34:23 Tower avahi-daemon[6330]: New relevant interface br-8c356d0b587c.IPv4 for mDNS.
Nov 21 17:34:23 Tower avahi-daemon[6330]: Registering new address record for 172.18.0.1 on br-8c356d0b587c.IPv4.
Nov 21 17:34:23 Tower kernel: IPv6: ADDRCONF(NETDEV_UP): br-8c356d0b587c: link is not ready
Nov 21 17:34:23 Tower avahi-daemon[6330]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Nov 21 17:34:23 Tower avahi-daemon[6330]: New relevant interface docker0.IPv4 for mDNS.
Nov 21 17:34:23 Tower avahi-daemon[6330]: Registering new address record for 172.17.0.1 on docker0.IPv4.
Nov 21 17:34:23 Tower kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready

Something to do with de dockers not able to connect or something?

Link to comment

Ok, the strangest thing happened...  I ended the parity check because i wanted to do a clean reboot (despite the 2 times it did not work properly).

Eventually after 10-15 minutes, the array started. All my dockers are back, but (not yet) started.

It started a parity check itself. I think it is better to let it run now?

 

But still, I don't know where the problems came from! (most recent syslog).

 

The only thing I can think of is that the "usable size; log" was completely full and my docker image is quite big (21gb).

Moreover, I think it is better to let deluge and sabnzbd put incomplete downloads on one of my drives straight away instead of first on my cache drive?

 

tower-diagnostics-20191121-1659.zip

Link to comment

There is the error again:

 

Unraid Cache disk message: 21-11-2019 18:21

Warning [TOWER] - Cache pool BTRFS missing device(s)
ADATA_SX6000LNP_2J2820125962 (nvme0n1)

 

Is this the part where the cachedrive is dead!? or loose? ;-)!

 

If so, how can I get it up and running again without loosing all settings?

tower-diagnostics-20191121-1725.zip

Edited by mvanhooff
Diagnostics, offline again
Link to comment

Did a normal shutdown (worked!). Took out the PCIe M2 SSD and put it back in. Restarted the NAS and started the array, again it took pretty long!

After a while the shares came up and eventually the dockers, but I get a error starting them:

image.png.7474b55b7ffe4976a55e49a6aa306c9c.png

 

Then I tried installing a ramdom docker:

image.thumb.png.d4b2d00a41ecf22eea3b7b5d222e7a96.png

 

The hell??

Then I checked my dashboard:

image.png.304b9b33dfae831a7ca1031d8bd8db05.png

 

Active, but SMART failure

image.png.12c32a7af9dad604a61b141496ead34f.png

7 unsafe shutdowns (probably the offline online)

and extra information:

image.png.900eb5ca05eba086ae8b52df35ae83db.png

 

And also the diagnostics.sys

 

What are the best options for now!?

Clean install unraid, but that won't fix the offline online problem I suppose.. could it be that it was softwarematically took offline??

I made a backup off the appdata, flashdrive, vm etc.

Can somebody give advise if one suggests to buy a new SSD (must be M2 PCIe to not lose a SATA port (that's what the motherboard manual says)). The SSD goes in to a M2 slot

tower-diagnostics-20191121-1857.zip

 

image.png

Edited by mvanhooff
double image
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.