Kilrah Posted March 30, 2023 Share Posted March 30, 2023 (edited) [Template only, I am not the container author/maintainer] Template: https://github.com/kilrah/unraid-docker-templates/raw/main/templates/serge.xml Source container: https://github.com/nsarrazin/serge Serge - LLaMa made easy A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU. A note on memory usage llama will just crash if you don't have enough available memory for your model. 7B requires about 4.5GB of free RAM 13B requires about 12GB free 30B requires about 20GB free Edited March 31, 2023 by Kilrah Quote Link to comment
gwazh Posted March 30, 2023 Share Posted March 30, 2023 Thanks for making this. Since llama got leaked I have been wanting someone to package this up like the diffusion package. This works great, it is slow on the cpu but It does work. Quote Link to comment
CrimsonTide Posted March 31, 2023 Share Posted March 31, 2023 Just did a default install, getting errors - ggml.c: In function 'quantize_row_q4_0': ggml.c:524:15: warning: unused variable 'nb' [-Wunused-variable] 524 | const int nb = k / QK; | ^~ ggml.c: In function 'ggml_vec_dot_q4_0': ggml.c:1924:18: warning: implicit conversion from 'float' to 'ggml_float' {aka 'double'} to match other operand of binary expression [-Wdouble-promotion] 1924 | sumf += f0*f2 + f1*f3; | ^~ llama.cpp: In function 'bool llama_model_quantize_internal(const string&, const string&, int)': llama.cpp:1455:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual] 1455 | finp.read ((char *) word.data(), len); | ^~~~~~~~~~~~~~~~~~~~ llama.cpp:1456:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual] 1456 | fout.write((char *) word.data(), len); | ^~~~~~~~~~~~~~~~~~~~ ./deploy.sh: line 10: 79 Illegal instruction mongod I llama.cpp build info: I UNAME_S: Linux I UNAME_P: x86_64 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread I LDFLAGS: I CC: cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 I CXX: g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 cc -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3 -c ggml.c -o ggml.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c llama.cpp -o llama.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c examples/common.cpp -o common.o g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/main/main.cpp ggml.o llama.o common.o -o main ==== Run ./main -h for help. ==== g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/quantize/quantize.cpp ggml.o llama.o -o quantize g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding Quote Link to comment
CrimsonTide Posted March 31, 2023 Share Posted March 31, 2023 Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops. Quote Link to comment
Kilrah Posted March 31, 2023 Author Share Posted March 31, 2023 (edited) I see the same on a fresh install but after a minute or so it's up... Note that the devs are making major updates pretty much every day at the moment, I wouldn't be surprised if it breaks a few times and gets fixed. Edited March 31, 2023 by Kilrah Quote Link to comment
CrimsonTide Posted April 1, 2023 Share Posted April 1, 2023 The default config for me never brings up a webUI and the container stops after a few minutes without any new errors in the log. Quote Link to comment
dopeytree Posted June 12, 2023 Share Posted June 12, 2023 Extra models can be downloaded here: https://huggingface.co/TheBloke and copied to appdata/serge/weights/ Quote Link to comment
Trylo Posted August 17, 2023 Share Posted August 17, 2023 Any plans to enable nvidia GPUs? Quote Link to comment
Kilrah Posted August 17, 2023 Author Share Posted August 17, 2023 That would be something to ask the actual container developers, but apparently it's in progress... https://github.com/serge-chat/serge/issues/43 Quote Link to comment
gyrene2083 Posted October 15, 2023 Share Posted October 15, 2023 Thanks for this curious though is there anyway to speed up responses? Quote Link to comment
Kilrah Posted October 15, 2023 Author Share Posted October 15, 2023 More powerful hardware / wait for GPU support and have an appropriate one 1 Quote Link to comment
gyrene2083 Posted October 15, 2023 Share Posted October 15, 2023 Thanks for that, this is just to test it out. I thought I was doing something wrong. I'm actually going to use my desktop that is a Threadripper with 128gb Ram, and it has a nvidia 3090 w 24gb VRam. With a spare 2TB NVME. Quote Link to comment
lozenge Posted October 23, 2023 Share Posted October 23, 2023 Hey is GPU support working? Has anyone been able to pass a GPU through to the container? If so what configurations did you add to the app? On a separate note, when I run the large models I don't actually see my RAM usage go up to the supposed 20GB+, has anyone else experienced this? Would be good to know your experiences Anyway, really cool project, much appreciated! Quote Link to comment
Kilrah Posted October 23, 2023 Author Share Posted October 23, 2023 (edited) Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it. Edited October 23, 2023 by Kilrah Quote Link to comment
anethema Posted November 13, 2023 Share Posted November 13, 2023 On 10/23/2023 at 3:46 AM, Kilrah said: Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it. Hey just looked at the issue and it appears that someone did make a few basic changes and got it working with the GPU. https://github.com/serge-chat/serge/issues/43#issuecomment-1792070396 Any idea if it would be possible for you to add that into your template etc to make it work ? Quote Link to comment
Kilrah Posted November 14, 2023 Author Share Posted November 14, 2023 They currently do it by modifying the container. Just wait until they get this incorporated. Quote Link to comment
Abhi Posted February 25 Share Posted February 25 On 3/31/2023 at 3:50 AM, CrimsonTide said: Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops. Same issue here. Was there ever a fix for this? Quote Link to comment
dopeytree Posted February 25 Share Posted February 25 There's an AI category in the unraid 'app store' so this can now be added to the template. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.