Problem when using immich in Unraid

March 22, 20251 yr

Hello,

So I have an UNRAID server, and several containers installed on it using Dockge.

I recently installed Immich so I can view my photos, but I'm having issues with it. I currently have it configured to read an external library (a shared folder in UNRAID called photos). The problem is that when the image sync starts, after 30-60 minutes, UNRAID restarts. I've never had this problem with any other container before.

The server is:

Intel 14500

64GB DDR5

Asrock z790 tb4 itx PG

Seasonic SPX650

Everything is up to date, and the container I'm using is the following:

#

# WARNING: To install Immich, follow our guide: https://immich.app/docs/install/docker-compose

#

# Make sure to use the docker-compose.yml of the current release:

#

# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

#

# The compose file on main may not be compatible with the latest release.

name: immich

services:

immich-server:

container_name: immich-server

image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}

# extends:

# file: hwaccel.transcoding.yml

# service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding

volumes:

- /mnt/user/photos:/mnt/media/photos:ro

- ${UPLOAD_LOCATION}:/usr/src/app/upload

- /etc/localtime:/etc/localtime:ro

labels:

net.unraid.docker.icon: /mnt/user/system/icons/Immich.png

net.unraid.docker.managed: dockerman

env_file:

- .env

depends_on:

- immich-redis

- immich-database

restart: always

healthcheck:

disable: false

immich-machine-learning:

container_name: immich-machine-learning

# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.

# Example tag: ${IMMICH_VERSION:-release}-cuda

image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}

# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration

# file: hwaccel.ml.yml

# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable

volumes:

- model-cache:/cache

labels:

net.unraid.docker.icon: /mnt/user/system/icons/Immich.png

net.unraid.docker.managed: dockerman

env_file:

- .env

restart: always

healthcheck:

disable: false

immich-redis:

container_name: immich-redis

image: docker.io/redis:6.2-alpine@sha256:148bb5411c184abd288d9aaed139c98123eeb8824c5d3fce03cf721db58066d8

command: redis-server --bind 0.0.0.0 --port 6381

healthcheck:

test: redis-cli -p 6381 ping || exit 1

restart: always

labels:

net.unraid.docker.icon: /mnt/user/system/icons/Immich.png

net.unraid.docker.managed: dockerman

immich-database:

container_name: immich-database

image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:739cdd626151ff1f796dc95a6591b55a714f341c737e27f045019ceabf8e8c52

environment:

POSTGRES_PASSWORD: ${DB_PASSWORD}

POSTGRES_USER: ${DB_USERNAME}

POSTGRES_DB: ${DB_DATABASE_NAME}

PGPORT: 5433

POSTGRES_INITDB_ARGS: --data-checksums

volumes:

# Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file

- ${DB_DATA_LOCATION}:/var/lib/postgresql/data

labels:

net.unraid.docker.icon: /mnt/user/system/icons/Immich.png

net.unraid.docker.managed: dockerman

healthcheck:

test: pg_isready --dbname="$${POSTGRES_DB}" --username="$${POSTGRES_USER}" ||

exit 1; Chksum="$$(psql --dbname="$${POSTGRES_DB}"

--username="$${POSTGRES_USER}" --tuples-only --no-align

--command='SELECT COALESCE(SUM(checksum_failures), 0) FROM

pg_stat_database')"; echo "checksum failure count is $$Chksum"; [

"$$Chksum" = '0' ] || exit 1

interval: 5m

start_interval: 30s

start_period: 5m

command: postgres -c shared_preload_libraries=vectors.so -c

'search_path="$$user", public, vectors' -c logging_collector=on -c

max_wal_size=2GB -c shared_buffers=512MB -c wal_compression=on

restart: always

volumes:

model-cache: null

networks:

default:

external: true

name: npm_network

I have done memtest, CPU stress tests and everything has been satisfactory, without errors or problems.

I've enabled persistent syslog logging, but there are absolutely no significant errors. 12:47 this is the last time the problem occurred

When this happened last time, I was standing next to the server and saw it completely shut down and then turn back on. I don't understand why.

I hope you can help me, thanks in advance.

syslog-previous

Quote

March 22, 20251 yr

Community Expert

...

Quote

March 22, 20251 yr

Author

13 minutes ago, bmartino1 said:

...

Very good guide, but I don't see the problem related to mine, do you have any thoughts?

Quote

March 22, 20251 yr

Community Expert

1 minute ago, Nozle said:

Very good guide, but I don't see the problem related to mine, do you have any thoughts?

not really. none that would help regarding software and configurations. as what your describing more is power supply and thermal overload...

as its is really strange issue—especially since it’s a full system power cycle rather than just a container crash or kernel panic. Given that you've done memory and CPU stress testing with no issues, and the syslog isn't capturing anything meaningful, here are some things you can try to dig deeper:

Power Supply Check

This sounds suspiciously like a hardware-level failure, particularly the PSU (Power Supply Unit). If Immich kicks off heavy disk or CPU I/O (which it does during image processing and machine learning tasks), it may be drawing just enough power to trigger a shutdown if the PSU is borderline.

Try checking:

Do the fans dip or lights flicker before shutdown?

Do you have another PSU you can test with?

Any surge protector/UPS involved that might be triggering protection?

Enable Temperature Monitoring

Even if CPU passed stress tests, real-world I/O + processing (especially with Immich doing ML stuff like face detection) might spike temps.

Tools like sensors (via NerdPack or via container) can help you log CPU/GPU temps.

Deeper Logging / Diagnostics

*Enable syslog... I will need a diag file your system log previous doesn't help...

Since syslog didn’t show much, try enabling IPMI event logging or BMC-level logging if your motherboard supports it (many server-grade boards do).

Enable "Local syslog mirror" under UNRAID's Settings > Syslog Server to write logs to USB or cache drive so logs persist across reboots.

Also check /var/log/libvirt/qemu and /var/log/docker.log (if available).

You may need to also apply compose docker limits;

deploy:
  resources:
    limits:
      cpus: '2.0'
      memory: 4096M

Adding to each docker to set # of CPU and ram memory limits
Also, ensure the Immich container isn’t using GPU acceleration if you haven’t allocated a GPU properly.

Why I would need the diag... this could be disk related...
Filesystem Access Issues

If Immich is reading from a mounted share or array that's spinning up dozens of disks, and there are any issues in the SATA/RAID controller, it might trip a system restart.

Check for:

Parity errors or SMART warnings in the Array Devices tab.

Try scanning your photos directory with something like du -sh or a find command and see if that triggers any issues.

Strip Down and Test

Temporarily configure Immich with a local dummy folder (not the full photos share), and see if the crash still occurs.

If not, then the issue is tied to that share—maybe a specific file, bad sector, or permission weirdness.

You could even try syncing a smaller subset of photos to test in isolation.

I post the guide alone as this seemed more like can't get compose working at all and the guide I made will get you off the ground...

Please post a diag file.

Quote

1

March 22, 20251 yr

Author

1 hour ago, bmartino1 said:
not really. none that would help regarding software and configurations. as what your describing more is power supply and thermal overload...

as its is really strange issue—especially since it’s a full system power cycle rather than just a container crash or kernel panic. Given that you've done memory and CPU stress testing with no issues, and the syslog isn't capturing anything meaningful, here are some things you can try to dig deeper:

Power Supply Check

This sounds suspiciously like a hardware-level failure, particularly the PSU (Power Supply Unit). If Immich kicks off heavy disk or CPU I/O (which it does during image processing and machine learning tasks), it may be drawing just enough power to trigger a shutdown if the PSU is borderline.

Try checking:

Do the fans dip or lights flicker before shutdown?

Do you have another PSU you can test with?

Any surge protector/UPS involved that might be triggering protection?

Enable Temperature Monitoring

Even if CPU passed stress tests, real-world I/O + processing (especially with Immich doing ML stuff like face detection) might spike temps.

Tools like sensors (via NerdPack or via container) can help you log CPU/GPU temps.

Deeper Logging / Diagnostics

*Enable syslog... I will need a diag file your system log previous doesn't help...

Since syslog didn’t show much, try enabling IPMI event logging or BMC-level logging if your motherboard supports it (many server-grade boards do).

Enable "Local syslog mirror" under UNRAID's Settings > Syslog Server to write logs to USB or cache drive so logs persist across reboots.

Also check /var/log/libvirt/qemu and /var/log/docker.log (if available).

You may need to also apply compose docker limits;
deploy:
  resources:
    limits:
      cpus: '2.0'
      memory: 4096M
Adding to each docker to set # of CPU and ram memory limits
Also, ensure the Immich container isn’t using GPU acceleration if you haven’t allocated a GPU properly.

Why I would need the diag... this could be disk related...
Filesystem Access Issues

If Immich is reading from a mounted share or array that's spinning up dozens of disks, and there are any issues in the SATA/RAID controller, it might trip a system restart.

Check for:

Parity errors or SMART warnings in the Array Devices tab.

Try scanning your photos directory with something like du -sh or a find command and see if that triggers any issues.

Strip Down and Test

Temporarily configure Immich with a local dummy folder (not the full photos share), and see if the crash still occurs.

If not, then the issue is tied to that share—maybe a specific file, bad sector, or permission weirdness.

You could even try syncing a smaller subset of photos to test in isolation.

I post the guide alone as this seemed more like can't get compose working at all and the guide I made will get you off the ground...

Please post a diag file.

Thank you very much for so much information and suggestions.

I've run some new tests without success, I removed the UPS in case it could have something to do with it, the problem persists, I changed the array disks directly connected to the motherboard, in case the ASM1106 could be the problem, the problem persists.

I'd like to test the power supply, but I don't know if there's any test that would allow me to test it 100%, that would be great.

I have another test I could try, although it involves bringing down the server, disassembling and changing... a few hours of work. I'd prefer to do a test before the change, but it wouldn't be a problem.

Temperature monitoring is enabled and everything is correct, CPU is at maximum 40º and the motherboard at maximum about 50º

I don't think the problem has anything to do with the disks, because when you do a parity for about 12-14 hours, there's no problem, but I could be wrong.

I need to look at the diagnostics section you mentioned to see if it gives any more clues, but I need to figure out how to do it best so it lasts.

I don't know how to check if it is using GPU acceleration (I don't have a dedicated GPU, I only have an Intel 14500)

find and du -sh show no errors or failures, everything runs fine.

No errors SMART.

If I make immich not read this external shared resource (it is not overloaded) so it does not restart and there are no problems, it is when it starts to load the shared resource (external library) when it generates the problem after 20-50 minutes.

Thanks for your time!

du -sh /mnt/disk1/photos/
194G /mnt/disk1/photos/

Edited March 22, 20251 yr by Nozle

Quote

March 22, 20251 yr

Author

4 hours ago, bmartino1 said:
not really. none that would help regarding software and configurations. as what your describing more is power supply and thermal overload...

as its is really strange issue—especially since it’s a full system power cycle rather than just a container crash or kernel panic. Given that you've done memory and CPU stress testing with no issues, and the syslog isn't capturing anything meaningful, here are some things you can try to dig deeper:

Power Supply Check

This sounds suspiciously like a hardware-level failure, particularly the PSU (Power Supply Unit). If Immich kicks off heavy disk or CPU I/O (which it does during image processing and machine learning tasks), it may be drawing just enough power to trigger a shutdown if the PSU is borderline.

Try checking:

Do the fans dip or lights flicker before shutdown?

Do you have another PSU you can test with?

Any surge protector/UPS involved that might be triggering protection?

Enable Temperature Monitoring

Even if CPU passed stress tests, real-world I/O + processing (especially with Immich doing ML stuff like face detection) might spike temps.

Tools like sensors (via NerdPack or via container) can help you log CPU/GPU temps.

Deeper Logging / Diagnostics

*Enable syslog... I will need a diag file your system log previous doesn't help...

Since syslog didn’t show much, try enabling IPMI event logging or BMC-level logging if your motherboard supports it (many server-grade boards do).

Enable "Local syslog mirror" under UNRAID's Settings > Syslog Server to write logs to USB or cache drive so logs persist across reboots.

Also check /var/log/libvirt/qemu and /var/log/docker.log (if available).

You may need to also apply compose docker limits;
deploy:
  resources:
    limits:
      cpus: '2.0'
      memory: 4096M
Adding to each docker to set # of CPU and ram memory limits
Also, ensure the Immich container isn’t using GPU acceleration if you haven’t allocated a GPU properly.

Why I would need the diag... this could be disk related...
Filesystem Access Issues

If Immich is reading from a mounted share or array that's spinning up dozens of disks, and there are any issues in the SATA/RAID controller, it might trip a system restart.

Check for:

Parity errors or SMART warnings in the Array Devices tab.

Try scanning your photos directory with something like du -sh or a find command and see if that triggers any issues.

Strip Down and Test

Temporarily configure Immich with a local dummy folder (not the full photos share), and see if the crash still occurs.

If not, then the issue is tied to that share—maybe a specific file, bad sector, or permission weirdness.

You could even try syncing a smaller subset of photos to test in isolation.

I post the guide alone as this seemed more like can't get compose working at all and the guide I made will get you off the ground...

Please post a diag file.

After adding:

deploy:
resources:
limits:
cpus: '2.0'
memory: 4096M

It's been running for over 4 hours.

What could be the real problem?

Quote

1

March 22, 20251 yr

Community Expert

1 hour ago, Nozle said:

After adding:

deploy:
resources:
limits:
cpus: '2.0'
memory: 4096M

It's been running for over 4 hours.

What could be the real problem?

resource management. you may be running more then the system could handle and or alot of compute needed for the first LM for face detection was needed and it finaly finsihed the compute task.

Do you run plex/jelly fin ? dns servers? nextcloud etc... You have to account 2-4 cpu threads and 2-4 GB of ram for each application.
Immich take 3 to use it services idle... it seems like it was maxing out on what it could grab to finish its tasks...

Quote

March 22, 20251 yr

Author

5 minutes ago, bmartino1 said:

resource management. you may be running more then the system could handle and or alot of compute needed for the first LM for face detection was needed and it finaly finsihed the compute task.

Do you run plex/jelly fin ? dns servers? nextcloud etc... You have to account 2-4 cpu threads and 2-4 GB of ram for each application.
Immich take 3 to use it services idle... it seems like it was maxing out on what it could grab to finish its tasks...

11 minutes ago, bmartino1 said:

resource management. you may be running more then the system could handle and or alot of compute needed for the first LM for face detection was needed and it finaly finsihed the compute task.

Do you run plex/jelly fin ? dns servers? nextcloud etc... You have to account 2-4 cpu threads and 2-4 GB of ram for each application.
Immich take 3 to use it services idle... it seems like it was maxing out on what it could grab to finish its tasks...

So you think it's not a hardware defect?

I was thinking of creating a Windows 11 live USB and trying some CPU stress programs and other things. I'd also run memtest again for at least 8-12 hours to make sure everything is okay. I don't have anything to run for the PSU.

My unraid server is:

Intel 14500

64GB DDR5

Asrock z790 tb4 itx PG

Seasonic SPX650

Array: 3 Toshiba N300 8TB HDD
Cache: 2x WD850X 1TB

ASM1106 x 6 SATA M2 updated

There are a few virtual machines, one for Windows 11 and one for HomeAssistant.

There are over 20 Docker containers:

Authentik
Docker Controller Bot
Watchtower
Dockge
Grafana
Hommar
Immich
Nodered
npm
Paperless
Unifi
Vaultwarden
Wallos
etc.

I also have a problem when I run PowerTop Autotune: the system crashes after a few hours. I still have to figure out who the culprit is.

I plan to use plex/jellyfin later.

Quote

March 23, 20251 yr

Community Expert

yep the instance I saw 20 dockers and VMS, you are over provisioning the CPU...

and they are all runnign at once?

Now that I think of it and rember.... there could be cpu issues due to intel firmware.

https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Intel-Core-13th-and-14th-Gen-Desktop-Instability-Root-Cause/post/1633239

https://community.intel.com/t5/Processors/July-2024-Update-on-Instability-Reports-on-Intel-Core-13th-and/m-p/1617113

I would live boot a win PE to make sure your running the latest Bios and have the latest intel microcode.

Edited March 23, 20251 yr by bmartino1
Data

Quote

1

March 23, 20251 yr

Author

9 hours ago, bmartino1 said:

yep the instance I saw 20 dockers and VMS, you are over provisioning the CPU...

and they are all runnign at once?

Now that I think of it and rember.... there could be cpu issues due to intel firmware.

https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Intel-Core-13th-and-14th-Gen-Desktop-Instability-Root-Cause/post/1633239

https://community.intel.com/t5/Processors/July-2024-Update-on-Instability-Reports-on-Intel-Core-13th-and/m-p/1617113

I would live boot a win PE to make sure your running the latest Bios and have the latest intel microcode.

So, yesterday and today were days of testing.

I ran 10 runs (11 hours) of memtest without failure.

I inserted a Windows 10 live USB drive with Ycruncher, and after a few minutes: blue error: CLOCK_WATCHDOG_TIMEOUT

I cleared CMOS, installed Ycruncher, and the same error occurred.

I tested two power supplies with Ycruncher, and the same error occurred.

The motherboard BIOS has been updated with the latest Asrock BIOS for four months since I built this server.

I just installed a 13500 CPU and will run ycruncher to see if the problem goes away.

If it does, the CPU is defective.

Damn, this processor is new for 2 months.

I hope this ordeal ends soon, but everything points to this CPU being defective.

I hope Intel doesn't waste time and replaces it quickly, otherwise my Unraid will be down for a long time. Now I'm also facing another problem: Unraid doesn't work at all. I get the following error and it won't start

rcu task blocked on level rcu node unraid

Quote

Problem when using immich in Unraid

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)