ZekerPixels Posted June 29, 2021 Share Posted June 29, 2021 (edited) Hi all, The server has an problem, it crashes every time within a short time after running mover. I have been using this system with 6.9.2 from release and it worked fine before and I have already done the following; - parity check - docker safe permissions - fix common problems - disabled VMs - disabled Dockers - mover, unbalance, krusader - memtest86, no issues on a couple of passes With Vms and Dockers disabled it still crashed every time within a minute of invoking mover. I hope you guys have a idea what the issue could be Anyways thanks for all the help ZPx Updated: https://forums.unraid.net/topic/110753-692-mover-crashes-server/?tab=comments#comment-1010818 Edited July 1, 2021 by ZekerPixels removed old files Quote Link to comment
trurl Posted June 30, 2021 Share Posted June 30, 2021 Start array and post new diagnostics Quote Link to comment
ZekerPixels Posted June 30, 2021 Author Share Posted June 30, 2021 Yes, that would have been a great idea. Updated, this time with the array running. Quote Link to comment
trurl Posted June 30, 2021 Share Posted June 30, 2021 Your eris pool is using different sized disks. Is this raid1 (default)? Quote Link to comment
ZekerPixels Posted June 30, 2021 Author Share Posted June 30, 2021 The array are 8TB and 2 4TB, Cache is 2 1TB disks. Eris are two different sized ssds 120gb and 240gb mirrored, so effectively having 120gb and yes it is using the default btrfs raid1. Appdata, domains and system is all on this pool. Quote Link to comment
trurl Posted June 30, 2021 Share Posted June 30, 2021 Setup Syslog Server so we can get syslog after a crash: https://wiki.unraid.net/Manual/Troubleshooting#Persistent_Logs_.28Syslog_server.29 Have you done memtest? Quote Link to comment
ZekerPixels Posted June 30, 2021 Author Share Posted June 30, 2021 (edited) I also tough it could be the ram, so yes I have run memtest. With single sticks and both together, resulting in no errors after 8 passes in each configuration. Also the server can complete a parity check without any issues, if it would have been the memory is probably shouldn't be able to do that because with mover (or another method moving form cache to array) it crashes every time within a minute. The only weird line in the syslog is line 169, this is also close to the crash. But doesn't show anything because its also there when it doesn't crash. "ntpd[1758]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized" idk but /Settings/DateTime shows the correct time Edited June 30, 2021 by ZekerPixels removed old files Quote Link to comment
ZekerPixels Posted June 30, 2021 Author Share Posted June 30, 2021 (edited) I had no solution or any clue on what the issue could be, so I made a fresh usb 6.9.2. Quickly setup my configuration, shares, ect. and it crashes. So, i have a fresh unraid install and having thesame issue as before. To me, that points to a hardware issue, what could to it. I removed the other files, these are the new diagnostics and syslog. I'm not sure of the time of the first crash, second one was on 02:20 Edited July 2, 2021 by ZekerPixels removed old files Quote Link to comment
trurl Posted July 1, 2021 Share Posted July 1, 2021 2 hours ago, ZekerPixels said: hardware issue, what could to it Power? CPU Cooling? Quote Link to comment
trurl Posted July 1, 2021 Share Posted July 1, 2021 That syslog is the same as the syslog in those diagnostics, in other words, it only includes the syslog information from the time of the last boot up until you took the syslog / diagnostics. We need syslog that shows what happened before booting after crash. After it crashes and you reboot, get the syslog saved by syslog server, it should include timestamps from before the reboot. Quote Link to comment
ZekerPixels Posted July 1, 2021 Author Share Posted July 1, 2021 On what the issue could be, it can complete a parity sync without any issues. I would think temperature is good and also power is good, because during the parity check there more cpu utilization and all disks are doing something ofc requiring more power. I don't have an extra psu or any spares actually, so I cant really change out parts to try something. The syslog that i posted should contain two crashes. Anyways I will make a new one and this time writing down the time of events, give me like an hour. Quote Link to comment
ZekerPixels Posted July 1, 2021 Author Share Posted July 1, 2021 (edited) I have the parity disks removed from the array, otherwise I need to cancel the parity check every time. And we can also exclude it have anything to do with generating parity when moving to the array. 12:38 turn on syslog and reboot 12:41 start array 12:43 download something to cache only folder using a docker 12:45 Crashed and automatic reboot 12:48 start array 12:51 start mover 12:51 Crashed and automatic reboot 12:55 generate "diagnostics1", disable docker and reboot 12:58 start array (docker and vms are disabled) 12:00 start mover 13:02 Crashed and automatic reboot 13:05 generate "diagnostics2" turn off syslog and get the syslog file Oke, so the syslog contains 3 crashes; - At the time of the first crash, there is nothing in the syslog. - At the second crash, also nothing - At the third crash, a bunch of BTRFS errors. There is al least something going on with the cache, but could have been caused by the very frequent crashes. Edited July 2, 2021 by ZekerPixels removed old files Quote Link to comment
trurl Posted July 1, 2021 Share Posted July 1, 2021 2 hours ago, ZekerPixels said: There is al least something going on with the cache, but could have been caused by the very frequent crashes. I don't see anything else. What controller is that disk on? Quote Link to comment
ZekerPixels Posted July 1, 2021 Author Share Posted July 1, 2021 (edited) I thought both ware cache drives where on the motherboard, but i just checked; 1 cache drive using the motherboard sata amd the other one is connected to LSI9211 The disk reported is just the disk is tries to write to, with the only consistent being the cache. Im sure the cache is messed up, it now reports 2TB (it is 1tb) anyways i need to figure out how i can copy everything for the cache to an external or something edit: Ok, the cache drive ending on 208 is definitely fucked. but I think I can safe most of the data for the other drive. Unfortunately it takes quiet some time because it about 500gb. EDIT UPDATE So far the issue is solved, what i have done is. After discovering the cache is the problem, making it crash every time something got written or read form it. I made a new usb, to start from fresh. Put one of the original cache disks as an array disk (btrfs) and tries to read the data of. The first disk did immediately crash again, but i could pull all the files from the second disk. So, basically it reinstalled everything the way it was before. I had backups of the dockers and a document with all the changes I made in the past. It took about 2 hours to set back everything to how it was before. I checked the latest files i copied for the cache and all files seam to be unharmed by this situation. Conclusion I don't think it was necessary to start for a fresh install, but it didn't take to much time and everything work as it supposed to. Edited July 7, 2021 by ZekerPixels Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.