Deepstack Docker GPU Version


Recommended Posts

Has anyone recently had any luck using Deepstack with a GPU within Unraid? I've been using the CPU version of @ndetar's which has been working wonderfully, but I have been unable to get either his (which has great instructions for converting it to GPU) or the officially documented Deepstack Docker GPU image here working correctly.

 

It does appear that Deepstack released a new version of GPU three days ago, but I have still not had luck either with the latest version or the second-most recent revision. I have nvidia-drivers up and running with a recommended device, but am still getting timeouts for some reason despite being able to confirm deepstacks activation. 

 

Any help is much appreciated!

Link to comment

I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context.

Edited by ndetar
Link to comment
1 hour ago, ndetar said:

I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context.

 

Hey @ndetar!

 

Thanks for responding! I have loved your container and hope we can figure this out. I really appreciate your time and assistance. 

 

Screenshots:

http://imgur.com/a/nkxBZcz

 

Logs:

 

Blockquoteroot@UNRAID:~# sudo docker exec -it DeepstackGPUOfficial /bin/bash
root@ed10552468a7:/app/server# cat …/logs/stderr.txt
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
Process Process-2:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/face.py”, line 73, in face
cuda=SharedOptions.CUDA_MODE,
File “/app/intelligencelayer/shared/./recognition/process.py”, line 31, in init
self.model = self.model.cuda()
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply
param_applied = fn(param)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/scene.py”, line 65, in scenerecognition
SharedOptions.CUDA_MODE,
File “/app/intelligencelayer/shared/scene.py”, line 38, in init
self.model = self.model.cuda()
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda
return self._apply(lambda t: t.cuda(device))
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply
module._apply(fn)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply
param_applied = fn(param)
File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File “/app/intelligencelayer/shared/./process.py”, line 36, in init
self.model = attempt_load(model_path, map_location=self.device)
File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load
torch.load(w, map_location=map_location)[“model”].float().fuse().eval()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load
result = unpickler.load()
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location
return default_restore_location(storage, str(map_location))
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location
result = fn(storage, location)
File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize
return obj.cuda(device)
File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda
return new_type(self.size()).copy(self, non_blocking)
File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

DeepStack: Version 2021.09.01

v1/vision/custom/dark

v1/vision/custom/poolcam

v1/vision/custom/unagi
/v1/vision/face

/v1/vision/face/recognize

/v1/vision/face/register

/v1/vision/face/match

/v1/vision/face/list

/v1/vision/face/delete

/v1/vision/detection

/v1/vision/scene

p

v1/restore

Timeout Log:

[GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection”
[GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection”

Link to comment

So if I understand the logs correctly I think that your GPU doesn't have enough vRAM for all the recognition processes you are trying to run. For me, just object detection takes half a gig of vRAM. Try running deepstack in GPU mode with fewer recognition APIs enable. Maybe just try object recognition by its self first and see if it runs.

Link to comment

With only one did the logs output the same errors?

In extra parameters did you add: --runtime=nvidia

 

Try removing the container image completely and pulling it again, and try the template I made in the app store fallowing the instructions for GPU use.

I'm curious what the logs show when you use the template.

image.png.e46892c325f4ef08003b9444239ce154.png

Edited by ndetar
  • Like 1
Link to comment

So that's interesting.

https://imgur.com/a/1uCqaWG

 

I stripped the image and reinstalled, and it appears the GPU is now being taxed per nvidia-smi. What's funny, however, is that as soon as I try to load any custom models, it fails entirely. I expect I have headroom based on the ram utilization of 625mb/2000mb for the base models on high, but cannot actually recall how I pulled that more detailed log which suggested a CUDA memory issue. 

 

Correction, I found the command: Here it is:

 

Quote

sudo docker exec -it container-name /bin/bash
once in the container, run
cat ../logs/stderr.txt

And the log:

Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location
    return default_restore_location(storage, str(map_location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 156, in _cuda_deserialize
    return obj.cuda(device)
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 77, in _cuda
    return new_type(self.size()).copy_(self, non_blocking)
  File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 480, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

 

Same CUDA error. I'll fiddle with this to see if I can get any custom models to run. It'll be really disappointing for 2gb isn't enough for any. 

 

Edited by Wicked_Chicken
Link to comment

Edit:


Issue was confirmed as insufficient CUDA memory. 

 

https://imgur.com/a/tvN804n

 

So it appears each custom model essentially runs as an independent process. I did not realize this, and am going to have to do some testing with with Yolov5s models to see if I can get decent models to lower GPU headroom, consider changing my GPU in the server, or offloading deepstack to my main PC with a far better GPU. 

@ndetar, you are a rockstar for helping me figure this out. 

  • Like 1
Link to comment
  • 1 month later...

Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message:

 

"NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch."

 

I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below:

 

    Traceback (most recent call last):
  File "/app/intelligencelayer/shared/face.py", line 307, in face
    det = detector.predict(img, 0.55)
  File "/app/intelligencelayer/shared/./process.py", line 61, in predict
    pred = self.model(img, augment=False)[0]
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward
    1,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward
    return F.hardswish(input, self.inplace)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__
    type(self).__name__, name))
AttributeError: 'Hardswish' object has no attribute 'inplace'

 

Do you have any recommendation on how to proceed here? Many thanks!

Edited by MarbleComa
Added info about my version of unraid and nvidia driver CUDA version
Link to comment
31 minutes ago, MarbleComa said:

Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message:

 

"NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch."

 

I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below:

 

    Traceback (most recent call last):
  File "/app/intelligencelayer/shared/face.py", line 307, in face
    det = detector.predict(img, 0.55)
  File "/app/intelligencelayer/shared/./process.py", line 61, in predict
    pred = self.model(img, augment=False)[0]
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward
    1,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward
    return F.hardswish(input, self.inplace)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__
    type(self).__name__, name))
AttributeError: 'Hardswish' object has no attribute 'inplace'

 

Do you have any recommendation on how to proceed here? Many thanks!

Further modifications are beyond what I am able to help with. You may have better luck posting or searching on the Deepstack forum. Sorry I can't be of much help. From what I can see it seems like the face recognition in Deepstack doesn't support the newer version of pytorch.

Edited by ndetar
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.