Unraid-server dies randomly


Recommended Posts

Hello!
My UNRAID-server dies randomly and i cant figure out why.
I have enabled 'MIrror syslog to flash' but i cant figure out whats wrong anyway. I have run full memtest without any errors/problem.

 

Specs:
Unraid Version: 6.10.0-rc2 (same problem with 6.9.2)
Cpu: i5-11600K
MB: ASUS PRIME Z590-P (latest BIOS & cstate disabled)
RAM: PNY XLR8 EPIC RGB 32GB (2x16GB) DDR4 3200Mhz CL16
Storage
2x Kingston A2000 500GB M.2 NVMe (cache in RAID1 btrfs )
1x Samsung SSD 850 EVO 250GB (XFS)

3x Seagate IronWolf ST4000VN008 64MB 4TB (XFS)


Syslog.
 

Spoiler

Nov 15 02:00:16 kapten crond[1118]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Nov 15 03:00:01 kapten Recycle Bin: Scheduled: Files older than 7 days have been removed
Nov 15 03:00:16 kapten crond[1118]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Nov 15 03:02:32 kapten root: /var/lib/docker: 43.4 GiB (46559395840 bytes) trimmed on /dev/loop2
Nov 15 03:02:32 kapten root: /mnt/raid: 814 GiB (874059063296 bytes) trimmed on /dev/nvme0n1p1
Nov 15 03:02:32 kapten root: /mnt/cache: 232.8 GiB (249922076672 bytes) trimmed on /dev/sdc1 (Last log before server dies)

Nov 15 08:12:49 kapten kernel: microcode: microcode updated early to revision 0x40, date = 2021-04-11 (I start the server again)
Nov 15 08:12:49 kapten kernel: Linux version 5.14.15-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Thu Oct 28 09:56:33 PDT 2021

 

 

I have screen and keyboard connected to server, but i get no picture and no response from keyboard

kapten-diagnostics-20211115-0843.zip

Edited by capt.shitface
Link to comment
1 hour ago, Tristankin said:

Sorry you did not get a reply to this. Are you using the intel iGPU for transcoding?

Thanks for notice!😇

Yes i do use Intel iGPU for transcode, but i didn't use plex at the time when it died.
I hasn't crash when id use Plex so far, Its so random. I only notice it because my phone cant sync to my Nextcloud.

I have now reset my BIOS and turned of XMP-profile (even tho memtest said it was all ok)

Is there any good software to test other hardware? I suspect hardware-failure tho the server has no time to write any errors to the syslog.
I also have problem shutting it down and stop the array. Problem with unmount, even with manual umount -f (target busy)

Edited by capt.shitface
Link to comment
42 minutes ago, Tristankin said:

I think there may be an issue with the intel iGPU module causing the system to hang (even just being loaded). 6.8.3 fixes this for me but I am talking with the moderators about a solution that allows a more modern version of the OS. 

 

Okay! Is there a way to see if thats the problem i have?

Any ways to trigger a crash och something to look for in the logs?

Link to comment
16 minutes ago, Tristankin said:

Absolutely nothing for me. It's the reason it has been generally blamed on hardware faults. For me it would happen within 2 days but generally at random. No load required.

If you don't have a lot of people transcoding you could try unloading the i915 module?

 

Yes i can try, i now(since yesterday) run the newest beta release of Plex that's finally able to transcode with the 11th gen cpu's quicksync.
Could it be Plex that's been causing the problem?


If its crash again, im gonna try to disable/unload the iGPU.

Btw: Whats the correct way to unload the iGPU?
#vi /boot/config/modprobe.d/blacklist.conf
options i915 force_probe=4c8a ???

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.