Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[Support] kilrah/serge

Featured Replies

[Template only, I am not the container author/maintainer]

 

Template: https://github.com/kilrah/unraid-docker-templates/raw/main/templates/serge.xml

Source container: https://github.com/nsarrazin/serge

 

Serge - LLaMa made easy

 

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

 

A note on memory usage

llama will just crash if you don't have enough available memory for your model.

7B requires about 4.5GB of free RAM

13B requires about 12GB free

30B requires about 20GB free

Edited by Kilrah

Thanks for making this. Since llama got leaked I have been wanting someone to package this up like the diffusion package. This works great, it is slow on the cpu but It does work.

  • Kilrah changed the title to [Support] kilrah/serge

Just did a default install, getting errors -

 

ggml.c: In function 'quantize_row_q4_0':
ggml.c:524:15: warning: unused variable 'nb' [-Wunused-variable]
  524 |     const int nb = k / QK;
      |               ^~
ggml.c: In function 'ggml_vec_dot_q4_0':
ggml.c:1924:18: warning: implicit conversion from 'float' to 'ggml_float' {aka 'double'} to match other operand of binary expression [-Wdouble-promotion]
 1924 |             sumf += f0*f2 + f1*f3;
      |                  ^~
llama.cpp: In function 'bool llama_model_quantize_internal(const string&, const string&, int)':
llama.cpp:1455:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1455 |             finp.read ((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
llama.cpp:1456:24: warning: cast from type 'const char*' to type 'char*' casts away qualifiers [-Wcast-qual]
 1456 |             fout.write((char *) word.data(), len);
      |                        ^~~~~~~~~~~~~~~~~~~~
./deploy.sh: line 10:    79 Illegal instruction     mongod
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread
I LDFLAGS:  
I CC:       cc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
I CXX:      g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -msse3   -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread -c examples/common.cpp -o common.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/main/main.cpp ggml.o llama.o common.o -o main 

====  Run ./main -h for help.  ====

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/quantize/quantize.cpp ggml.o llama.o -o quantize 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -pthread examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding 

 

Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops.

  • Author

I see the same on a fresh install but after a minute or so it's up...

 

Note that the devs are making major updates pretty much every day at the moment, I wouldn't be surprised if it breaks a few times and gets fixed.

Edited by Kilrah

The default config for me never brings up a webUI and the container stops after a few minutes without any new errors in the log.

  • 2 months later...
  • 2 months later...

Any plans to enable nvidia GPUs?

  • 1 month later...

Thanks for this curious though is there anyway to speed up responses? 

  • Author

More powerful hardware / wait for GPU support and have an appropriate one

Thanks for that, this is just to test it out. I thought I was doing something wrong. I'm actually going to use my desktop that is a Threadripper with 128gb Ram, and it has a nvidia 3090 w 24gb VRam. With a spare 2TB NVME. 

Hey is GPU support working? Has anyone been able to pass a GPU through to the container? If so what configurations did you add to the app? 
On a separate note, when I run the large models I don't actually see my RAM usage go up to the supposed 20GB+, has anyone else experienced this? Would be good to know your experiences
Anyway, really cool project, much appreciated!

  • Author

Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it.

Edited by Kilrah

  • 3 weeks later...
On 10/23/2023 at 3:46 AM, Kilrah said:

Serge doesn't support GPU, see the issue on their github linked a few posts above about adding it.

 

Hey just looked at the issue and it appears that someone did make a few basic changes and got it working with the GPU.

 

https://github.com/serge-chat/serge/issues/43#issuecomment-1792070396

 

Any idea if it would be possible for you to add that into your template etc to make it work ?

  • Author

They currently do it by modifying the container. Just wait until they get this incorporated.

  • 3 months later...
On 3/31/2023 at 3:50 AM, CrimsonTide said:

Not sure if any of those are fatal errors - but the webUI never loads, and eventually the container just stops.

Same issue here. Was there ever a fix for this?

There's an AI category in the unraid 'app store' so this can now be added to the template. 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.