Troubleshooting crash maybe related to a bad docker container


JohanSF

Recommended Posts

This is a continuation of:

and

with diagnostics as per Squids' instructions, however, I did have to reboot in order to start the docker service again.
hal9000-diagnostics-20181108-1715.zip

 

In response to

I do have my appdata on the cache drive. I also think it did move last night, the 260 GB here makes sense as I downloaded large content just after the first crash. But it does indeed seem to have something to do with the cache drive and/or a container.

Edited by JohanSF
Link to comment
32 minutes ago, johnnie.black said:

Alright I don't really know what I am doing but you ask me to do 

btrfs balance start -dusage=75 /mnt/cache

 in the console right?

Edited by JohanSF
Link to comment

Yes, it's fine, and it shouldn't happen again, this only happens with older kernels, or users coming from older kernels and never ran a balance which I assume is your case.

 

Before:

                  Data      Metadata System              
Id Path           single    single   single   Unallocated
-- -------------- --------- -------- -------- -----------
 1 /dev/nvme0n1p1 474.93GiB  2.01GiB  4.00MiB    56.00KiB
-- -------------- --------- -------- -------- -----------
   Total          474.93GiB  2.01GiB  4.00MiB    56.00KiB
   Used           240.03GiB  1.27GiB 80.00KiB 

 

 

After:

                 Data      Metadata System              
Id Path           single    single   single   Unallocated
-- -------------- --------- -------- -------- -----------
 1 /dev/nvme0n1p1 253.01GiB  3.01GiB  4.00MiB   220.92GiB
-- -------------- --------- -------- -------- -----------
   Total          253.01GiB  3.01GiB  4.00MiB   220.92GiB
   Used           240.02GiB  1.24GiB 64.00KiB  

Problem was the unallocated space, which you dind't have any.

Link to comment

I celebrated too early. The whole unRaid server crashed again now during the night. It must have been before 3:40 am as the mover has not run.

Here is the syslog and diagnostics:

syslog.txt (I know that Ihal9000-diagnostics-20181109-0622.zipserver to watch something on plex up until about 11 pm)

hal9000-diagnostics-20181109-0622.zip

 

It should also not be caused by my Ryzen 1700 processor as I have the zenstates script applied to disable C6 states:

image.png.c2b3eb91307b624563a902054cfc753b.png

image.thumb.png.fc2a53800eb7d471c1bb63b602942b16.png

Edited by JohanSF
Link to comment
4 minutes ago, bonienl said:

Either a balance or scrub operation is being performed and array can not be stopped until this operation is completed.

Ok. It is unresponsive in the way that on the main page, everything on the page under the disk status boxes is now missing. Using my phone with teamviewer to see it.

 

I can also see that the log has red erros. I can post that when I get home.

Edited by JohanSF
Link to comment
1 hour ago, bonienl said:

It might be a corrupted cache file system. Can you post diagnostics. If the GUI doesn't work then use terminal/telnet and type 'diagnostics', the zip file will be saved on your flash device in the /logs folder.

 

Probably need the help of the true expert @johnnie.black

I can click Download diagnostics but it is collecting diagnosis information forever and the download never happens.

Trying with the terminal method I get "Starting diagnostics collection..." and nothing happens.

Update: I cannot restart it remotely it seems, have to do a hard reset when I get home. I really hope the cache drive is not corrupted :(

Edited by JohanSF
Link to comment

Got home to this log:

chrome_2018-11-09_15-35-52.thumb.png.d927a3672795dc8607e05797e40dde63.png

Restarted the machine with hardware button. Here are the diagnostics before starting the array:

hal9000-diagnostics-20181109-1539.zip

It started, parity check runs and dockers started too. I am looking at this now:

image.thumb.png.baa615fe15a50089d87bb3926de856eb.png

 

Should I start the Troubleshooting Mode in "Fix Common Problems"?

Edit: Not sure I can do that though, the "Scanning" when I enter the page seems to stay there forever. This is in the log:

image.png.1c37414509a0354f6ee7110e0ea19ccd.png

Edited by JohanSF
Link to comment

I'd like to know exactly what the nginx errors you're seeing are about, as I've seen them myself on occasion but I've never seen an explanation for them. The Web GUI pages are really quite complicated and for nginx to serve one up it has to retrieve the sources from multiple locations, most of which are dynamic and dependent on scripts completing and returning the necessary code. That looks as though it's failing here and causing the unresponsiveness.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.