xer01ne Posted September 17, 2024 Posted September 17, 2024 I am trying to get the Local.ai container to work, but I can't figure out why none of the models will work. First, when I try to send a query to Local.ai's chat UI page, none of my video cards are tasked. After reviewing the logs, it appears that the models are failing to load. Has anyone had this issue or know how to fix it? For the record, the Ollama server and Open WebUI containers work just fine, so I know the drivers are good. Here is the log when the container starts... @@@@@ If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed: CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" see the documentation at: https://localai.io/basics/build/index.html Note: See also https://github.com/go-skynet/LocalAI/issues/288 @@@@@ CPU info: model name : Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d CPU: AVX found OK CPU: no AVX2 found CPU: no AVX512 found @@@@@ Here are is the logs when I send a query to chat... 3:08PM INF Trying to load the model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with the backend '[llama-cpp llama-ggml llama-cpp-fallback piper rwkv stablediffusion whisper huggingface bert-embeddings /build/backend/python/sentencetransformers/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/coqui/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/transformers/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/bark/run.sh /build/backend/python/mamba/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/vllm/run.sh /build/backend/python/transformers-musicgen/run.sh]' 3:08PM INF [llama-cpp] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-cpp 3:08PM INF [llama-cpp] Fails: backend not found: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp 3:08PM INF [llama-cpp] Autodetection failed, trying the fallback 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend 3:08PM INF [llama-cpp] Fails: fork/exec grpc: permission denied 3:08PM INF [llama-ggml] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-ggml 3:08PM INF [llama-ggml] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF 3:08PM INF [llama-cpp-fallback] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend llama-cpp-fallback 3:08PM INF [llama-cpp-fallback] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: read tcp 127.0.0.1:59348->127.0.0.1:33383: read: connection reset by peer 3:08PM INF [piper] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend piper 3:08PM INF [piper] Fails: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/Bunny-Llama-3-8B-Q4_K_M.gguf (should end with .onnx) 3:08PM INF [rwkv] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend rwkv 3:08PM INF [rwkv] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF 3:08PM INF [stablediffusion] Attempting to load 3:08PM INF Loading model 'Bunny-Llama-3-8B-Q4_K_M.gguf' with backend stablediffusion 3:08PM INF [stablediffusion] Loads OK And again, here is my nvidia-smi output: +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA RTX A4000 Off | 00000000:05:00.0 Off | Off | | 45% 64C P0 59W / 140W | 4MiB / 16376MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA RTX A2000 Off | 00000000:42:00.0 Off | Off | | 35% 65C P0 41W / 70W | 4MiB / 6138MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ The container I installed is: localai/localai:master-cublas-cuda12 (I also tried the latest cuda that Unraid offers, both have the same results) Bridge, Bash, 8080, and debug=false. Again, any help would be greatly appreciated. Quote
disco4000 Posted September 18, 2024 Posted September 18, 2024 (edited) I've got the same problem, it seems to be the lack of extensions (AVX2, etc) of our Ivy Bridge XEONS. I followed the build guide and rebuild the CPU only version with the CMAKE arguments in the log and after that the chat option in the WEBUI immediately worked with several different models. Connecting other apps via API to LocalAI worked as well. If I understand Mulder's change logs correct, normaly LocalAI should have all the backends precompiled for non-AVX, AVX and AVX2 and automatically decide which one to use. But this seems to fail, especially when running LocalAI in docker. ATM I'm rebuilding the CUDA12 image without all extension. This will take another 2-3 hours. Lets see if this works afterwards as well. Edited September 18, 2024 by disco4000 Quote
xer01ne Posted September 18, 2024 Author Posted September 18, 2024 Awesome! Thanks for confirming this... I am interested in your results! Quote
disco4000 Posted September 18, 2024 Posted September 18, 2024 OK, I can confirm, that really did the trick. After rebuilding the image, the chat function works, and the UNRAID UI also shows that the P4000 is totally involved. 👍 Unfortunately the image is now bloated up to 60GB on my docker volume. I will post an issue on the localai github space and send the logs to mulder. Quote
xer01ne Posted September 19, 2024 Author Posted September 19, 2024 60GB!?! Ouch!! That’s insane! … … … I’m gonna do it!! 🤣😂 Thanks for sharing, I’m glad to see I wasn’t going crazy! I’ll let you know how it goes! Quote
lqpnote Posted December 16, 2024 Posted December 16, 2024 On 9/19/2024 at 5:42 AM, xer01ne said: 60GB!?! Ouch!! That’s insane! … … … I’m gonna do it!! 🤣😂 Thanks for sharing, I’m glad to see I wasn’t going crazy! I’ll let you know how it goes! Hey may, Ive got an Intel ARC A310 GPU do you reckon rebuilding the cpu docker img will get it to work with the gpu? If so do you mind making a quick tutorial for the steps? Im new and feel kinda lost Ive tried cpu you only and it fries up the cpu lol Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.