Jump to content

Multiple Kernel and btrfs errors I don't understand


Recommended Posts

Hi

 

Yesterday I had some issues with one of my unraid servers. I noticed when I got the message from pulseway that CPU was over 90% for over 5 minutes. I logged in to the WebUI which was unresponsive. My containers didn't seem to work anymore as I could not connect to them and couldn't open the docker tab on the UI. Strangely enough my htop in SSH didn't show much cpu activity, albeit the system load being shown as over 190. 

 

In SSH I tried some things and decided to reboot the system, which did nothing. It did the output that "system is going down" but it never did. I was able to "rescue" the diagnostics log so I'll attach it here. I just today found some time to skim through the logs and it is a little frighting honestly to see so many errors. Some were critical btrfs errors (from the cache array) and other errors I think are related. I don't know but assume the cpu load was from btrfs, but the system was completely locked. 

 

I could only restart it (thankfully soft reset) by short pressing the power button. After that it went down gracefully. After a restart everything seems to be in order and fine again. 

Two files are attached. One is the snippet from the syslog where the errors began (note, there is a gap in the syslog of almost 8 hours before the errors began) and the other is the full diagnostics I rescued.

 

Thanks everyone for any attempt to bring light to this mess.

log snippet.txt azeroth-diagnostics-20211003-1310.zip

Link to comment

Thanks for the advice. The file system check (at least the scrub) has been successful. I'll try to do the other steps.

 

Do you think the system lockup has anything to do with that then?

 

Any advice on how to best do this? Just start in maintenance, copy everything over, format and copy back or is there a tool to help achieve that?

Edited by RedXon
Link to comment
2 hours ago, RedXon said:

Do you think the system lockup has anything to do with that then?

It's possible.

 

2 hours ago, RedXon said:

Any advice on how to best do this? Just start in maintenance, copy everything over, format and copy back or is there a tool to help achieve that?

You can't copy data in maintenance mode, start in normal mode and use your favorite tool do copy, you can use CLIi tools like cp or rsync, or tools with a GUI like midnight commander or the krusader docker.

Link to comment

Something seems really broken... Server went offline again while moving the files with MC. Tried to disable all VMs and Docker, disabled VM and Docker services and set all shares to cache: yes. Mover didn't really want to work so I cancelled it and just moved things with MC. All went fine until it reached the appdata folder then it first went really slow until it eventually just broke and I cant get in with WebUI, then SSH broke and now I can't even ping it anymore. 

 

Bad thing: this is my offsite server so I will postpone this until I can be at the site for at least a day. It does seem to be only the FS though as extended smart checks on all my 4 SSDs have been fine. 

Link to comment
1 hour ago, RedXon said:

Something seems really broken... Server went offline again while moving the files with MC. Tried to disable all VMs and Docker, disabled VM and Docker services and set all shares to cache: yes. Mover didn't really want to work so I cancelled it and just moved things with MC. All went fine until it reached the appdata folder then it first went really slow until it eventually just broke and I cant get in with WebUI, then SSH broke and now I can't even ping it anymore.

Was the Docker service disabled in Settings ?

Link to comment
  • 2 weeks later...

Copying did take a while and I had some freezes while doing that, I believe the Filesystem was truly corrupted or something. I managed though to move everything from the cache to the array finally and did double check everything copied well.

 

Then I checked the SSDs thoroughly with Smart and nothing was unusual... so far so good. I now formatted the cache array (it was old, sure, the array was created in 2016) and am hoping that everything goes well from here. Now the mover is slowly moving files back... could take a while. Hopefully you wont hear from me again with this issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...