[Support] kilrah/serge

Kilrah · March 30, 2023

[Template only, I am not the container author/maintainer]

Template: https://github.com/kilrah/unraid-docker-templates/raw/main/templates/serge.xml

Source container: https://github.com/nsarrazin/serge

Serge - LLaMa made easy

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

A note on memory usage

llama will just crash if you don't have enough available memory for your model.

7B requires about 4.5GB of free RAM

13B requires about 12GB free

30B requires about 20GB free

Edited March 31, 2023 by Kilrah

gwazh · March 30, 2023

Thanks for making this. Since llama got leaked I have been wanting someone to package this up like the diffusion package. This works great, it is slow on the cpu but It does work.

CrimsonTide · March 31, 2023

Just did a default install, getting errors -

ggml.c: In function 'quantize_row_q4_0':
ggml.c:524:15: warning: unused variable 'nb' [-Wunused-variable]
  524 |     const int nb = k / QK;
      |               ^~
ggml.c: In function 'ggml_vec_dot_q4_0':
ggml.c:1924:18: warning: implicit conversion from 'float' to 'ggml_float' {aka 'double'} to match other operand of binary expression [-Wdouble-promotion]
 1924 |             sumf += f0*f2 + f1*f3;
      |                  ^~
llama.cpp: In function 'bool llama_model_quantize_internal(const string&, const string&, int)':
llama.cpp:1455:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1455 |             finp.read ((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
llama.cpp:1456:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1456 |             fout.write((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
./deploy.sh: line 10:    79 Illegal instruction     mongod
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3   -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c examples/common.cpp -o common.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/main/main.cpp ggml.o llama.o common.o -o main 

====  Run ./main -h for help.  ====

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/quantize/quantize.cpp ggml.o llama.o -o quantize 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding

CrimsonTide · March 31, 2023

Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops.

Kilrah · March 31, 2023

I see the same on a fresh install but after a minute or so it's up...

Note that the devs are making major updates pretty much every day at the moment, I wouldn't be surprised if it breaks a few times and gets fixed.

Edited March 31, 2023 by Kilrah

CrimsonTide · April 1, 2023

The default config for me never brings up a webUI and the container stops after a few minutes without any new errors in the log.

dopeytree · June 12, 2023

Extra models can be downloaded here: https://huggingface.co/TheBloke

and copied to appdata/serge/weights/

Trylo · August 17, 2023

Any plans to enable nvidia GPUs?

Kilrah · August 17, 2023

That would be something to ask the actual container developers, but apparently it's in progress...

https://github.com/serge-chat/serge/issues/43

gyrene2083 · October 15, 2023

Thanks for this curious though is there anyway to speed up responses?

Kilrah · October 15, 2023

More powerful hardware / wait for GPU support and have an appropriate one

gyrene2083 · October 15, 2023

Thanks for that, this is just to test it out. I thought I was doing something wrong. I'm actually going to use my desktop that is a Threadripper with 128gb Ram, and it has a nvidia 3090 w 24gb VRam. With a spare 2TB NVME.

lozenge · October 23, 2023

Hey is GPU support working? Has anyone been able to pass a GPU through to the container? If so what configurations did you add to the app?
On a separate note, when I run the large models I don't actually see my RAM usage go up to the supposed 20GB+, has anyone else experienced this? Would be good to know your experiences
Anyway, really cool project, much appreciated!

Kilrah · October 23, 2023

Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it.

Edited October 23, 2023 by Kilrah

anethema · November 13, 2023

On 10/23/2023 at 3:46 AM, Kilrah said:

Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it.

Hey just looked at the issue and it appears that someone did make a few basic changes and got it working with the GPU.

https://github.com/serge-chat/serge/issues/43#issuecomment-1792070396

Any idea if it would be possible for you to add that into your template etc to make it work ?

Kilrah · November 14, 2023

They currently do it by modifying the container. Just wait until they get this incorporated.

Abhi · February 25

On 3/31/2023 at 3:50 AM, CrimsonTide said:

Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops.

Same issue here. Was there ever a fix for this?

dopeytree · February 25

There's an AI category in the unraid 'app store' so this can now be added to the template.

[Support] kilrah/serge

Recommended Posts

Kilrah

Link to comment

gwazh

Link to comment

CrimsonTide

Link to comment

CrimsonTide

Link to comment

Kilrah

Link to comment

CrimsonTide

Link to comment

dopeytree

Link to comment

Trylo

Link to comment

Kilrah

Link to comment

gyrene2083

Link to comment

Kilrah

Link to comment

gyrene2083

Link to comment

lozenge

Link to comment

Kilrah

Link to comment

anethema

Link to comment

Kilrah

Link to comment

Abhi

Link to comment

dopeytree

Link to comment

Join the conversation