March 30, 20233 yr [Template only, I am not the container author/maintainer] Template: https://github.com/kilrah/unraid-docker-templates/raw/main/templates/serge.xml Source container: https://github.com/nsarrazin/serge Serge - LLaMa made easy A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU. A note on memory usage llama will just crash if you don't have enough available memory for your model. 7B requires about 4.5GB of free RAM 13B requires about 12GB free 30B requires about 20GB free Edited March 31, 20233 yr by Kilrah
March 30, 20233 yr Thanks for making this. Since llama got leaked I have been wanting someone to package this up like the diffusion package. This works great, it is slow on the cpu but It does work.
March 31, 20233 yr Just did a default install, getting errors - ggml.c: In function 'quantize_row_q4_0': ggml.c:524:15: warning: unused variable 'nb' [-Wunused-variable] 524 | const int nb = k / QK; | ^~ ggml.c: In function 'ggml_vec_dot_q4_0': ggml.c:1924:18: warning: implicit conversion from 'float' to 'ggml_float' {aka 'double'} to match other operand of binary expression [-Wdouble-promotion] 1924 | sumf += f0*f2 + f1*f3; | ^~ llama.cpp: In function 'bool llama_model_quantize_internal(const string&, const string&, int)': llama.cpp:1455:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual] 1455 | finp.read ((char *) word.data(), len); | ^~~~~~~~~~~~~~~~~~~~ llama.cpp:1456:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual] 1456 | fout.write((char *) word.data(), len); | ^~~~~~~~~~~~~~~~~~~~ ./deploy.sh: line 10: 79 Illegal instruction mongod I llama.cpp build info: I UNAME_S: Linux I UNAME_P: x86_64 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread I LDFLAGS: I CC: cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 I CXX: g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 cc -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3 -c ggml.c -o ggml.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c llama.cpp -o llama.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c examples/common.cpp -o common.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/main/main.cpp ggml.o llama.o common.o -o main ==== Run ./main -h for help. ==== g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/quantize/quantize.cpp ggml.o llama.o -o quantize g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding
March 31, 20233 yr Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops.
March 31, 20233 yr Author I see the same on a fresh install but after a minute or so it's up... Note that the devs are making major updates pretty much every day at the moment, I wouldn't be surprised if it breaks a few times and gets fixed. Edited March 31, 20233 yr by Kilrah
April 1, 20233 yr The default config for me never brings up a webUI and the container stops after a few minutes without any new errors in the log.
June 12, 20233 yr Extra models can be downloaded here: https://huggingface.co/TheBloke and copied to appdata/serge/weights/
August 17, 20232 yr Author That would be something to ask the actual container developers, but apparently it's in progress... https://github.com/serge-chat/serge/issues/43
October 15, 20232 yr Author More powerful hardware / wait for GPU support and have an appropriate one
October 15, 20232 yr Thanks for that, this is just to test it out. I thought I was doing something wrong. I'm actually going to use my desktop that is a Threadripper with 128gb Ram, and it has a nvidia 3090 w 24gb VRam. With a spare 2TB NVME.
October 23, 20232 yr Hey is GPU support working? Has anyone been able to pass a GPU through to the container? If so what configurations did you add to the app? On a separate note, when I run the large models I don't actually see my RAM usage go up to the supposed 20GB+, has anyone else experienced this? Would be good to know your experiences Anyway, really cool project, much appreciated!
October 23, 20232 yr Author Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it. Edited October 23, 20232 yr by Kilrah
November 13, 20232 yr On 10/23/2023 at 3:46 AM, Kilrah said: Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it. Hey just looked at the issue and it appears that someone did make a few basic changes and got it working with the GPU. https://github.com/serge-chat/serge/issues/43#issuecomment-1792070396 Any idea if it would be possible for you to add that into your template etc to make it work ?
November 14, 20232 yr Author They currently do it by modifying the container. Just wait until they get this incorporated.
February 25, 20242 yr On 3/31/2023 at 3:50 AM, CrimsonTide said: Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops. Same issue here. Was there ever a fix for this?
February 25, 20242 yr There's an AI category in the unraid 'app store' so this can now be added to the template.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.