[Support] kilrah/serge


Recommended Posts

[Template only, I am not the container author/maintainer]

 

Template: https://github.com/kilrah/unraid-docker-templates/raw/main/templates/serge.xml

Source container: https://github.com/nsarrazin/serge

 

Serge - LLaMa made easy

 

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

 

A note on memory usage

llama will just crash if you don't have enough available memory for your model.

7B requires about 4.5GB of free RAM

13B requires about 12GB free

30B requires about 20GB free

Edited by Kilrah
Link to comment
  • Kilrah changed the title to [Support] kilrah/serge

Just did a default install, getting errors -

 

ggml.c: In function 'quantize_row_q4_0':
ggml.c:524:15: warning: unused variable 'nb' [-Wunused-variable]
  524 |     const int nb = k / QK;
      |               ^~
ggml.c: In function 'ggml_vec_dot_q4_0':
ggml.c:1924:18: warning: implicit conversion from 'float' to 'ggml_float' {aka 'double'} to match other operand of binary expression [-Wdouble-promotion]
 1924 |             sumf += f0*f2 + f1*f3;
      |                  ^~
llama.cpp: In function 'bool llama_model_quantize_internal(const string&, const string&, int)':
llama.cpp:1455:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1455 |             finp.read ((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
llama.cpp:1456:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1456 |             fout.write((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
./deploy.sh: line 10:    79 Illegal instruction     mongod
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3   -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c examples/common.cpp -o common.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/main/main.cpp ggml.o llama.o common.o -o main 

====  Run ./main -h for help.  ====

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/quantize/quantize.cpp ggml.o llama.o -o quantize 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding 

 

Link to comment

I see the same on a fresh install but after a minute or so it's up...

 

Note that the devs are making major updates pretty much every day at the moment, I wouldn't be surprised if it breaks a few times and gets fixed.

Edited by Kilrah
Link to comment
  • 2 months later...
  • 2 months later...
  • 1 month later...

Hey is GPU support working? Has anyone been able to pass a GPU through to the container? If so what configurations did you add to the app? 
On a separate note, when I run the large models I don't actually see my RAM usage go up to the supposed 20GB+, has anyone else experienced this? Would be good to know your experiences
Anyway, really cool project, much appreciated!

Link to comment
  • 3 weeks later...
On 10/23/2023 at 3:46 AM, Kilrah said:

Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it.

 

Hey just looked at the issue and it appears that someone did make a few basic changes and got it working with the GPU.

 

https://github.com/serge-chat/serge/issues/43#issuecomment-1792070396

 

Any idea if it would be possible for you to add that into your template etc to make it work ?

Link to comment
  • 3 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.