Local.ai not loading models...

September 17, 20241 yr

I am trying to get the Local.ai container to work, but I can't figure out why none of the models will work.

First, when I try to send a query to Local.ai's chat UI page, none of my video cards are tasked. After reviewing the logs, it appears that the models are failing to load.

Has anyone had this issue or know how to fix it?

For the record, the Ollama server and Open WebUI containers work just fine, so I know the drivers are good.

Here is the log when the container starts...

@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
CPU:    AVX    found OK
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@

Here are is the logs when I send a query to chat...

3:08PM INF Trying to load the model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with the backend '[llama-cpp llama-ggml llama-cpp-fallback piper rwkv stablediffusion whisper huggingface bert-embeddings /build/backend/python/sentencetransformers/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/coqui/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/transformers/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/bark/run.sh /build/backend/python/mamba/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/vllm/run.sh /build/backend/python/transformers-musicgen/run.sh]'
3:08PM INF [llama-cpp] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-cpp
3:08PM INF [llama-cpp] Fails: backend not found: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
3:08PM INF [llama-cpp] Autodetection failed, trying the fallback
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend 
3:08PM INF [llama-cpp] Fails: fork/exec grpc: permission denied
3:08PM INF [llama-ggml] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-ggml
3:08PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
3:08PM INF [llama-cpp-fallback] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-cpp-fallback
3:08PM INF [llama-cpp-fallback] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: read tcp 127.0.0.1:59348->127.0.0.1:33383: read: connection reset by peer
3:08PM INF [piper] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend piper
3:08PM INF [piper] Fails: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/Bunny-Llama-3-8B-Q4_K_M.gguf (should end with .onnx)
3:08PM INF [rwkv] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend rwkv
3:08PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
3:08PM INF [stablediffusion] Attempting to load
3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend stablediffusion
3:08PM INF [stablediffusion] Loads OK

And again, here is my nvidia-smi output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4000               Off |   00000000:05:00.0 Off |                  Off |
| 45%   64C    P0             59W /  140W |       4MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A2000               Off |   00000000:42:00.0 Off |                  Off |
| 35%   65C    P0             41W /   70W |       4MiB /   6138MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

The container I installed is:

localai/localai:master-cublas-cuda12 (I also tried the latest cuda that Unraid offers, both have the same results)

Bridge, Bash, 8080, and debug=false.

Again, any help would be greatly appreciated.

Quote

September 18, 20241 yr

I've got the same problem, it seems to be the lack of extensions (AVX2, etc) of our Ivy Bridge XEONS.
I followed the build guide and rebuild the CPU only version with the CMAKE arguments in the log and after that the chat option in the WEBUI immediately worked with several different models. Connecting other apps via API to LocalAI worked as well.
If I understand Mulder's change logs correct, normaly LocalAI should have all the backends precompiled for non-AVX, AVX and AVX2 and automatically decide which one to use.
But this seems to fail, especially when running LocalAI in docker.
ATM I'm rebuilding the CUDA12 image without all extension. This will take another 2-3 hours. Lets see if this works afterwards as well.

Edited September 18, 20241 yr by disco4000

Quote

September 18, 20241 yr

Author

Awesome! Thanks for confirming this... I am interested in your results!

Quote

September 18, 20241 yr

OK, I can confirm, that really did the trick. After rebuilding the image, the chat function works, and the UNRAID UI also shows that the P4000 is totally involved. 👍

Unfortunately the image is now bloated up to 60GB on my docker volume. I will post an issue on the localai github space and send the logs to mulder.

Quote

September 19, 20241 yr

Author

60GB!?! Ouch!! That’s insane!

… … … I’m gonna do it!! 🤣😂

Thanks for sharing, I’m glad to see I wasn’t going crazy! I’ll let you know how it goes!

Quote

December 16, 20241 yr

On 9/19/2024 at 5:42 AM, xer01ne said:

60GB!?! Ouch!! That’s insane!

… … … I’m gonna do it!! 🤣😂

Thanks for sharing, I’m glad to see I wasn’t going crazy! I’ll let you know how it goes!

Hey may, Ive got an Intel ARC A310 GPU do you reckon rebuilding the cpu docker img will get it to work with the gpu? If so do you mind making a quick tutorial for the steps? Im new and feel kinda lost Ive tried cpu you only and it fries up the cpu lol

Quote

Local.ai not loading models...

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)