[6.9.2] Server unusable because of very frequent crashes


Recommended Posts

Hi all,

 

The server has an problem, it crashes every time within a short time after running mover.  I have been using this system with 6.9.2 from release and it worked fine before and I have already done the following;

- parity check

- docker safe permissions

- fix common problems

- disabled VMs

- disabled Dockers

- mover, unbalance, krusader

- memtest86, no issues on a couple of passes

 

With Vms and Dockers disabled it still crashed every time within a minute of invoking mover.

 

I hope you guys have a idea what the issue could be

Anyways thanks for all the help

ZPx

 

 

Updated: https://forums.unraid.net/topic/110753-692-mover-crashes-server/?tab=comments#comment-1010818

 

 

 

Edited by ZekerPixels
removed old files
Link to comment

I also tough it could be the ram, so yes I have run memtest. With single sticks and both together, resulting in no errors after 8 passes in each configuration. Also the server can complete a parity check without any issues, if it would have been the memory is probably shouldn't be able to do that because with mover (or another method moving form cache to array) it crashes every time within a minute.

 

The only weird line in the syslog is line 169, this is also close to the crash. But doesn't show anything because its also there when it doesn't crash.

"ntpd[1758]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized"

idk but /Settings/DateTime shows the correct time

 

 

Edited by ZekerPixels
removed old files
Link to comment

I had no solution or any clue on what the issue could be, so I made a fresh usb 6.9.2.

Quickly setup my configuration, shares, ect.  and it crashes.

 

So, i have a fresh unraid install and having thesame issue as before. To me, that points to a hardware issue, what could to it.

 I removed the other files, these are the new diagnostics and syslog.

 

I'm not sure of the time of the first crash, second one was on 02:20

 

 

 

Edited by ZekerPixels
removed old files
Link to comment
  • ZekerPixels changed the title to [6.9.2] Server unusable because of very frequent crashes

That syslog is the same as the syslog in those diagnostics, in other words, it only includes the syslog information from the time of the last boot up until you took the syslog / diagnostics.

 

We need syslog that shows what happened before booting after crash. After it crashes and you reboot, get the syslog saved by syslog server, it should include timestamps from before the reboot.

Link to comment

On what the issue could be, it can complete a parity sync without any issues. I would think temperature is good and also power is good, because during the parity check there more cpu utilization and all disks are doing something ofc requiring more power. I don't have an extra psu or any spares actually, so I cant really change out parts to try something.

 

The syslog that i posted should contain two crashes. Anyways I will make a new one and this time writing down the time of events, give me like an hour.

Link to comment

I have the parity disks removed from the array, otherwise I need to cancel the parity check every time. And we can also exclude it have anything to do with generating parity when moving to the array.

 

12:38 turn on syslog and reboot

12:41 start array

12:43 download something to cache only folder using a docker

12:45 Crashed and automatic reboot

12:48 start array

12:51 start mover

12:51 Crashed and automatic reboot

12:55 generate "diagnostics1", disable docker and reboot

12:58 start array (docker and vms are disabled)

12:00 start mover

13:02 Crashed and automatic reboot

13:05 generate "diagnostics2"

turn off syslog and get the syslog file

 

Oke, so the syslog contains 3 crashes;

- At the time of the first crash, there is nothing in the syslog.

- At the second crash, also nothing

- At the third crash, a bunch of BTRFS errors. There is al least something going on with the cache, but could have been caused by the very frequent crashes.

 

Edited by ZekerPixels
removed old files
Link to comment

I thought both ware cache drives where on the motherboard, but i just checked;

1 cache drive using the motherboard sata amd the other one is connected to LSI9211

 

The disk reported is just the disk is tries to write to, with the only consistent being the cache.

Im sure the cache is messed up, it now reports 2TB (it is 1tb)

 

anyways i need to figure out how i can copy everything for the cache to an external or something

 

edit: Ok, the cache drive ending on 208 is definitely fucked. but I think I can safe most of the data for the other drive. Unfortunately it takes quiet some time because it about 500gb.

 

 

 

 

 

EDIT

UPDATE

 

So far the issue is solved, what i have done is. After discovering the cache is the problem, making it crash every time something got written or read form it. I made a new usb, to start from fresh. Put one of the original cache disks as an array disk (btrfs) and tries to read the data of. The first disk did immediately crash again, but i could pull all the files from the second disk.

So, basically it reinstalled everything the way it was before. I had backups of the dockers and a document with all the changes I made in the past. It took about 2 hours to set back everything to how it was before. I checked the latest files i copied for the cache and all files seam to be unharmed by this situation.

 

Conclusion I don't think it was necessary to start for a fresh install, but it didn't take to much time and everything work as it supposed to.

 

 

Edited by ZekerPixels
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.