[6.9.2] Server unusable because of very frequent crashes

ZekerPixels · June 29, 2021

Hi all,

The server has an problem, it crashes every time within a short time after running mover. I have been using this system with 6.9.2 from release and it worked fine before and I have already done the following;

- parity check

- docker safe permissions

- fix common problems

- disabled VMs

- disabled Dockers

- mover, unbalance, krusader

- memtest86, no issues on a couple of passes

With Vms and Dockers disabled it still crashed every time within a minute of invoking mover.

I hope you guys have a idea what the issue could be

Anyways thanks for all the help

ZPx

Updated: https://forums.unraid.net/topic/110753-692-mover-crashes-server/?tab=comments#comment-1010818

Edited July 1, 2021 by ZekerPixels
removed old files

trurl · June 30, 2021

Start array and post new diagnostics

ZekerPixels · June 30, 2021

Yes, that would have been a great idea. Updated, this time with the array running.

trurl · June 30, 2021

Your eris pool is using different sized disks. Is this raid1 (default)?

ZekerPixels · June 30, 2021

The array are 8TB and 2 4TB, Cache is 2 1TB disks. Eris are two different sized ssds 120gb and 240gb mirrored, so effectively having 120gb and yes it is using the default btrfs raid1. Appdata, domains and system is all on this pool.

trurl · June 30, 2021

Setup Syslog Server so we can get syslog after a crash:

https://wiki.unraid.net/Manual/Troubleshooting#Persistent_Logs_.28Syslog_server.29

Have you done memtest?

ZekerPixels · June 30, 2021

I also tough it could be the ram, so yes I have run memtest. With single sticks and both together, resulting in no errors after 8 passes in each configuration. Also the server can complete a parity check without any issues, if it would have been the memory is probably shouldn't be able to do that because with mover (or another method moving form cache to array) it crashes every time within a minute.

The only weird line in the syslog is line 169, this is also close to the crash. But doesn't show anything because its also there when it doesn't crash.

"ntpd[1758]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized"

idk but /Settings/DateTime shows the correct time

Edited June 30, 2021 by ZekerPixels
removed old files

ZekerPixels · June 30, 2021

I had no solution or any clue on what the issue could be, so I made a fresh usb 6.9.2.

Quickly setup my configuration, shares, ect. and it crashes.

So, i have a fresh unraid install and having thesame issue as before. To me, that points to a hardware issue, what could to it.

I removed the other files, these are the new diagnostics and syslog.

I'm not sure of the time of the first crash, second one was on 02:20

Edited July 2, 2021 by ZekerPixels
removed old files

trurl · July 1, 2021

2 hours ago, ZekerPixels said:

hardware issue, what could to it

Power? CPU Cooling?

trurl · July 1, 2021

That syslog is the same as the syslog in those diagnostics, in other words, it only includes the syslog information from the time of the last boot up until you took the syslog / diagnostics.

We need syslog that shows what happened before booting after crash. After it crashes and you reboot, get the syslog saved by syslog server, it should include timestamps from before the reboot.

ZekerPixels · July 1, 2021

On what the issue could be, it can complete a parity sync without any issues. I would think temperature is good and also power is good, because during the parity check there more cpu utilization and all disks are doing something ofc requiring more power. I don't have an extra psu or any spares actually, so I cant really change out parts to try something.

The syslog that i posted should contain two crashes. Anyways I will make a new one and this time writing down the time of events, give me like an hour.

ZekerPixels · July 1, 2021

I have the parity disks removed from the array, otherwise I need to cancel the parity check every time. And we can also exclude it have anything to do with generating parity when moving to the array.

12:38 turn on syslog and reboot

12:41 start array

12:43 download something to cache only folder using a docker

12:45 Crashed and automatic reboot

12:48 start array

12:51 start mover

12:51 Crashed and automatic reboot

12:55 generate "diagnostics1", disable docker and reboot

12:58 start array (docker and vms are disabled)

12:00 start mover

13:02 Crashed and automatic reboot

13:05 generate "diagnostics2"

turn off syslog and get the syslog file

Oke, so the syslog contains 3 crashes;

- At the time of the first crash, there is nothing in the syslog.

- At the second crash, also nothing

- At the third crash, a bunch of BTRFS errors. There is al least something going on with the cache, but could have been caused by the very frequent crashes.

Edited July 2, 2021 by ZekerPixels
removed old files

trurl · July 1, 2021

2 hours ago, ZekerPixels said:

There is al least something going on with the cache, but could have been caused by the very frequent crashes.

I don't see anything else. What controller is that disk on?

ZekerPixels · July 1, 2021

I thought both ware cache drives where on the motherboard, but i just checked;

1 cache drive using the motherboard sata amd the other one is connected to LSI9211

The disk reported is just the disk is tries to write to, with the only consistent being the cache.

Im sure the cache is messed up, it now reports 2TB (it is 1tb)

anyways i need to figure out how i can copy everything for the cache to an external or something

edit: Ok, the cache drive ending on 208 is definitely fucked. but I think I can safe most of the data for the other drive. Unfortunately it takes quiet some time because it about 500gb.

EDIT

UPDATE

So far the issue is solved, what i have done is. After discovering the cache is the problem, making it crash every time something got written or read form it. I made a new usb, to start from fresh. Put one of the original cache disks as an array disk (btrfs) and tries to read the data of. The first disk did immediately crash again, but i could pull all the files from the second disk.

So, basically it reinstalled everything the way it was before. I had backups of the dockers and a document with all the changes I made in the past. It took about 2 hours to set back everything to how it was before. I checked the latest files i copied for the cache and all files seam to be unharmed by this situation.

Conclusion I don't think it was necessary to start for a fresh install, but it didn't take to much time and everything work as it supposed to.

Edited July 7, 2021 by ZekerPixels

[6.9.2] Server unusable because of very frequent crashes

Recommended Posts

ZekerPixels

Link to comment

trurl

Link to comment

ZekerPixels

Link to comment

trurl

Link to comment

ZekerPixels

Link to comment

trurl

Link to comment

ZekerPixels

Link to comment

ZekerPixels

Link to comment

trurl

Link to comment

trurl

Link to comment

ZekerPixels

Link to comment

ZekerPixels

Link to comment

trurl

Link to comment

ZekerPixels

Link to comment

Join the conversation