Deepstack Docker GPU Version

Wicked_Chicken · October 1, 2021

Has anyone recently had any luck using Deepstack with a GPU within Unraid? I've been using the CPU version of @ndetar's which has been working wonderfully, but I have been unable to get either his (which has great instructions for converting it to GPU) or the officially documented Deepstack Docker GPU image here working correctly.

It does appear that Deepstack released a new version of GPU three days ago, but I have still not had luck either with the latest version or the second-most recent revision. I have nvidia-drivers up and running with a recommended device, but am still getting timeouts for some reason despite being able to confirm deepstacks activation.

Any help is much appreciated!

ndetar · October 2, 2021

I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context.

Edited October 2, 2021 by ndetar

Wicked_Chicken · October 2, 2021

1 hour ago, ndetar said:

I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context.

Hey @ndetar!

Thanks for responding! I have loved your container and hope we can figure this out. I really appreciate your time and assistance.

Screenshots:

http://imgur.com/a/nkxBZcz

Logs:

Blockquoteroot@UNRAID:~# sudo docker exec -it DeepstackGPUOfficial /bin/bash
root@ed10552468a7:/app/server# cat …/logs/stderr.txt
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-2:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/face.py”, line 73, in face
cuda=SharedOptions.CUDA_MODE,
File “/app/intelligencelayer/shared/./recognition/process.py”, line 31, in init
self.model = self.model.cuda()
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply
param_applied = fn(param)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/scene.py”, line 65, in scenerecognition
SharedOptions.CUDA_MODE,
File “/app/intelligencelayer/shared/scene.py”, line 38, in init
self.model = self.model.cuda()
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply
param_applied = fn(param)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

DeepStack: Version 2021.09.01

v1/vision/custom/dark

v1/vision/custom/poolcam

v1/vision/custom/unagi
/v1/vision/face

/v1/vision/face/recognize

/v1/vision/face/register

/v1/vision/face/match

/v1/vision/face/list

/v1/vision/face/delete

/v1/vision/detection

/v1/vision/scene

p

v1/restore

Timeout Log:

[GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection”
[GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection”

ndetar · October 2, 2021

So if I understand the logs correctly I think that your GPU doesn't have enough vRAM for all the recognition processes you are trying to run. For me, just object detection takes half a gig of vRAM. Try running deepstack in GPU mode with fewer recognition APIs enable. Maybe just try object recognition by its self first and see if it runs.

Wicked_Chicken · October 2, 2021

That was a good idea. I tried loading object detection only, but when checking nvidia-smi I'm still not seeing any GPU use. I'm wondering if the GPU isn't visible which is why its reading no ram.

ndetar · October 3, 2021

With only one did the logs output the same errors?

In extra parameters did you add: --runtime=nvidia

Try removing the container image completely and pulling it again, and try the template I made in the app store fallowing the instructions for GPU use.

I'm curious what the logs show when you use the template.

image.png.e46892c325f4ef08003b9444239ce154.png

Edited October 3, 2021 by ndetar

Wicked_Chicken · October 4, 2021

So that's interesting.

https://imgur.com/a/1uCqaWG

I stripped the image and reinstalled, and it appears the GPU is now being taxed per nvidia-smi. What's funny, however, is that as soon as I try to load any custom models, it fails entirely. I expect I have headroom based on the ram utilization of 625mb/2000mb for the base models on high, ~~but cannot actually recall how I pulled that more detailed log which suggested a CUDA memory issue.~~

Correction, I found the command: Here it is:

Quote

sudo docker exec -it container-name /bin/bash
once in the container, run
cat ../logs/stderr.txt

And the log:

Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
self.model = attempt_load(model_path, map_location=self.device)
File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
torch.load(w, map_location=map_location)["model"].float().fuse().eval()
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load
result = unpickler.load()
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location
return default_restore_location(storage, str(map_location))
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 156, in _cuda_deserialize
return obj.cuda(device)
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 77, in _cuda
return new_type(self.size()).copy_(self, non_blocking)
File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 480, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

Same CUDA error. I'll fiddle with this to see if I can get any custom models to run. It'll be really disappointing for 2gb isn't enough for any.

Edited October 4, 2021 by Wicked_Chicken

Wicked_Chicken · October 4, 2021

Edit:

Issue was confirmed as insufficient CUDA memory.

https://imgur.com/a/tvN804n

So it appears each custom model essentially runs as an independent process. I did not realize this, and am going to have to do some testing with with Yolov5s models to see if I can get decent models to lower GPU headroom, consider changing my GPU in the server, or offloading deepstack to my main PC with a far better GPU.

@ndetar, you are a rockstar for helping me figure this out.

ndetar · October 4, 2021

Glad you were able to figure it out. You might be able to run two instances of deepstack, one GPU and one CPU, and run as much as you can on the GPU one. It would require more configuration in the end but it should lighten the load on the CPU.

MarbleComa · November 11, 2021

Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message:

"NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch."

I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below:

Traceback (most recent call last):
File "/app/intelligencelayer/shared/face.py", line 307, in face
det = detector.predict(img, 0.55)
File "/app/intelligencelayer/shared/./process.py", line 61, in predict
pred = self.model(img, augment=False)[0]
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward
1,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward
return self.act(self.bn(self.conv(x)))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward
return F.hardswish(input, self.inplace)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__
type(self).__name__, name))
AttributeError: 'Hardswish' object has no attribute 'inplace'

Do you have any recommendation on how to proceed here? Many thanks!

Edited November 11, 2021 by MarbleComa
Added info about my version of unraid and nvidia driver CUDA version

ndetar · November 11, 2021

31 minutes ago, MarbleComa said:

Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message:

"NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch."

I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below:

Traceback (most recent call last):
File "/app/intelligencelayer/shared/face.py", line 307, in face
det = detector.predict(img, 0.55)
File "/app/intelligencelayer/shared/./process.py", line 61, in predict
pred = self.model(img, augment=False)[0]
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward
1,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward
return self.act(self.bn(self.conv(x)))
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward
return F.hardswish(input, self.inplace)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__
type(self).__name__, name))
AttributeError: 'Hardswish' object has no attribute 'inplace'

Do you have any recommendation on how to proceed here? Many thanks!

Further modifications are beyond what I am able to help with. You may have better luck posting or searching on the Deepstack forum. Sorry I can't be of much help. From what I can see it seems like the face recognition in Deepstack doesn't support the newer version of pytorch.

Edited November 11, 2021 by ndetar

Deepstack Docker GPU Version

Recommended Posts

Wicked_Chicken

Link to comment

ndetar

Link to comment

Wicked_Chicken

Link to comment

ndetar

Link to comment

Wicked_Chicken

Link to comment

ndetar

Link to comment

Wicked_Chicken

Link to comment

Wicked_Chicken

Link to comment

ndetar

Link to comment

MarbleComa

Link to comment

ndetar

Link to comment

Join the conversation