Wicked_Chicken Posted October 1, 2021 Share Posted October 1, 2021 Has anyone recently had any luck using Deepstack with a GPU within Unraid? I've been using the CPU version of @ndetar's which has been working wonderfully, but I have been unable to get either his (which has great instructions for converting it to GPU) or the officially documented Deepstack Docker GPU image here working correctly. It does appear that Deepstack released a new version of GPU three days ago, but I have still not had luck either with the latest version or the second-most recent revision. I have nvidia-drivers up and running with a recommended device, but am still getting timeouts for some reason despite being able to confirm deepstacks activation. Any help is much appreciated! Quote Link to comment
ndetar Posted October 2, 2021 Share Posted October 2, 2021 (edited) I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context. Edited October 2, 2021 by ndetar Quote Link to comment
Wicked_Chicken Posted October 2, 2021 Author Share Posted October 2, 2021 1 hour ago, ndetar said: I have been using it with a GPU for a while now and its been working great. Could you provide some additional information such as the log output from the container, maybe a screenshot of your config, etc. It's hard to troubleshoot without more context. Hey @ndetar! Thanks for responding! I have loved your container and hope we can figure this out. I really appreciate your time and assistance. Screenshots: http://imgur.com/a/nkxBZcz Logs: Blockquoteroot@UNRAID:~# sudo docker exec -it DeepstackGPUOfficial /bin/bash root@ed10552468a7:/app/server# cat …/logs/stderr.txt Process Process-1: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection detector = YOLODetector(model_path, reso, cuda=CUDA_MODE) File “/app/intelligencelayer/shared/./process.py”, line 36, in init self.model = attempt_load(model_path, map_location=self.device) File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load torch.load(w, map_location=map_location)[“model”].float().fuse().eval() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load result = unpickler.load() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location return default_restore_location(storage, str(map_location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location result = fn(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize return obj.cuda(device) File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda return new_type(self.size()).copy(self, non_blocking) File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new return super(_CudaBase, cls).new(cls, *args, **kwargs) RuntimeError: CUDA error: out of memory Process Process-1: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection detector = YOLODetector(model_path, reso, cuda=CUDA_MODE) File “/app/intelligencelayer/shared/./process.py”, line 36, in init self.model = attempt_load(model_path, map_location=self.device) File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load torch.load(w, map_location=map_location)[“model”].float().fuse().eval() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load result = unpickler.load() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location return default_restore_location(storage, str(map_location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location result = fn(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize return obj.cuda(device) File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda return new_type(self.size()).copy(self, non_blocking) File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new return super(_CudaBase, cls).new(cls, *args, **kwargs) RuntimeError: CUDA error: out of memory Process Process-1: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection detector = YOLODetector(model_path, reso, cuda=CUDA_MODE) File “/app/intelligencelayer/shared/./process.py”, line 36, in init self.model = attempt_load(model_path, map_location=self.device) File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load torch.load(w, map_location=map_location)[“model”].float().fuse().eval() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load result = unpickler.load() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location return default_restore_location(storage, str(map_location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location result = fn(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize return obj.cuda(device) File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda return new_type(self.size()).copy(self, non_blocking) File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new return super(_CudaBase, cls).new(cls, *args, **kwargs) RuntimeError: CUDA error: out of memory Process Process-2: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/face.py”, line 73, in face cuda=SharedOptions.CUDA_MODE, File “/app/intelligencelayer/shared/./recognition/process.py”, line 31, in init self.model = self.model.cuda() File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda return self._apply(lambda t: t.cuda(device)) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply module._apply(fn) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply module._apply(fn) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply param_applied = fn(param) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory Process Process-1: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/scene.py”, line 65, in scenerecognition SharedOptions.CUDA_MODE, File “/app/intelligencelayer/shared/scene.py”, line 38, in init self.model = self.model.cuda() File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in cuda return self._apply(lambda t: t.cuda(device)) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 354, in _apply module._apply(fn) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 376, in _apply param_applied = fn(param) File “/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py”, line 458, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory Process Process-1: Traceback (most recent call last): File “/usr/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap self.run() File “/usr/lib/python3.7/multiprocessing/process.py”, line 99, in run self._target(*self._args, **self._kwargs) File “/app/intelligencelayer/shared/detection.py”, line 69, in objectdetection detector = YOLODetector(model_path, reso, cuda=CUDA_MODE) File “/app/intelligencelayer/shared/./process.py”, line 36, in init self.model = attempt_load(model_path, map_location=self.device) File “/app/intelligencelayer/shared/./models/experimental.py”, line 159, in attempt_load torch.load(w, map_location=map_location)[“model”].float().fuse().eval() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 842, in _load result = unpickler.load() File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 803, in restore_location return default_restore_location(storage, str(map_location)) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 174, in default_restore_location result = fn(storage, location) File “/usr/local/lib/python3.7/dist-packages/torch/serialization.py”, line 156, in _cuda_deserialize return obj.cuda(device) File “/usr/local/lib/python3.7/dist-packages/torch/_utils.py”, line 77, in cuda return new_type(self.size()).copy(self, non_blocking) File “/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py”, line 480, in _lazy_new return super(_CudaBase, cls).new(cls, *args, **kwargs) RuntimeError: CUDA error: out of memory DeepStack: Version 2021.09.01 v1/vision/custom/dark v1/vision/custom/poolcam v1/vision/custom/unagi /v1/vision/face /v1/vision/face/recognize /v1/vision/face/register /v1/vision/face/match /v1/vision/face/list /v1/vision/face/delete /v1/vision/detection /v1/vision/scene p v1/restore Timeout Log: [GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection” [GIN] 2021/10/01 - 19:46:30 | 500 | 1m0s | 54.86.50.139 | POST “/v1/vision/detection” Quote Link to comment
ndetar Posted October 2, 2021 Share Posted October 2, 2021 So if I understand the logs correctly I think that your GPU doesn't have enough vRAM for all the recognition processes you are trying to run. For me, just object detection takes half a gig of vRAM. Try running deepstack in GPU mode with fewer recognition APIs enable. Maybe just try object recognition by its self first and see if it runs. Quote Link to comment
Wicked_Chicken Posted October 2, 2021 Author Share Posted October 2, 2021 That was a good idea. I tried loading object detection only, but when checking nvidia-smi I'm still not seeing any GPU use. I'm wondering if the GPU isn't visible which is why its reading no ram. Quote Link to comment
ndetar Posted October 3, 2021 Share Posted October 3, 2021 (edited) With only one did the logs output the same errors? In extra parameters did you add: --runtime=nvidia Try removing the container image completely and pulling it again, and try the template I made in the app store fallowing the instructions for GPU use. I'm curious what the logs show when you use the template. Edited October 3, 2021 by ndetar 1 Quote Link to comment
Wicked_Chicken Posted October 4, 2021 Author Share Posted October 4, 2021 (edited) So that's interesting. https://imgur.com/a/1uCqaWG I stripped the image and reinstalled, and it appears the GPU is now being taxed per nvidia-smi. What's funny, however, is that as soon as I try to load any custom models, it fails entirely. I expect I have headroom based on the ram utilization of 625mb/2000mb for the base models on high, but cannot actually recall how I pulled that more detailed log which suggested a CUDA memory issue. Correction, I found the command: Here it is: Quote sudo docker exec -it container-name /bin/bash once in the container, run cat ../logs/stderr.txt And the log: Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection detector = YOLODetector(model_path, reso, cuda=CUDA_MODE) File "/app/intelligencelayer/shared/./process.py", line 36, in __init__ self.model = attempt_load(model_path, map_location=self.device) File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load torch.load(w, map_location=map_location)["model"].float().fuse().eval() File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load result = unpickler.load() File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load load_tensor(data_type, size, key, _maybe_decode_ascii(location)) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor loaded_storages[key] = restore_location(storage, location) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location return default_restore_location(storage, str(map_location)) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location result = fn(storage, location) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 156, in _cuda_deserialize return obj.cuda(device) File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 77, in _cuda return new_type(self.size()).copy_(self, non_blocking) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 480, in _lazy_new return super(_CudaBase, cls).__new__(cls, *args, **kwargs) RuntimeError: CUDA error: out of memory Same CUDA error. I'll fiddle with this to see if I can get any custom models to run. It'll be really disappointing for 2gb isn't enough for any. Edited October 4, 2021 by Wicked_Chicken Quote Link to comment
Wicked_Chicken Posted October 4, 2021 Author Share Posted October 4, 2021 Edit: Issue was confirmed as insufficient CUDA memory. https://imgur.com/a/tvN804n So it appears each custom model essentially runs as an independent process. I did not realize this, and am going to have to do some testing with with Yolov5s models to see if I can get decent models to lower GPU headroom, consider changing my GPU in the server, or offloading deepstack to my main PC with a far better GPU. @ndetar, you are a rockstar for helping me figure this out. 1 Quote Link to comment
ndetar Posted October 4, 2021 Share Posted October 4, 2021 Glad you were able to figure it out. You might be able to run two instances of deepstack, one GPU and one CPU, and run as much as you can on the GPU one. It would require more configuration in the end but it should lighten the load on the CPU. Quote Link to comment
MarbleComa Posted November 11, 2021 Share Posted November 11, 2021 (edited) Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message: "NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch." I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below: Traceback (most recent call last): File "/app/intelligencelayer/shared/face.py", line 307, in face det = detector.predict(img, 0.55) File "/app/intelligencelayer/shared/./process.py", line 61, in predict pred = self.model(img, augment=False)[0] File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward return self.forward_once(x, profile) # single-scale inference, train File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once x = m(x) # run File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward 1, File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward return self.act(self.bn(self.conv(x))) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward return F.hardswish(input, self.inplace) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__ type(self).__name__, name)) AttributeError: 'Hardswish' object has no attribute 'inplace' Do you have any recommendation on how to proceed here? Many thanks! Edited November 11, 2021 by MarbleComa Added info about my version of unraid and nvidia driver CUDA version Quote Link to comment
ndetar Posted November 11, 2021 Share Posted November 11, 2021 (edited) 31 minutes ago, MarbleComa said: Hi @ndetar, thanks for creating this unraid template! I've been trying for a few hours to get the deepstack gpu docker running on my system, but it seems that attempting to use a 3060 ti leads to quite some issues. With the default installation directly from the deepquestai/deepstack:gpu repository, nothing works, and I was able to determine that the included version of pytorch does not support the 3060 ti. I ran cat ../logs/stderr.txt and found the following message: "NVIDIA GeForce RTX 3060 ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 ti GPU with PyTorch." I did some reading on the pytorch site (https://pytorch.org/get-started/locally/), and I found the pip install command to get their LTS (1.8.2) version which supports cuda 11.1. As a side note, I'm on Unraid 6.10.0-rc2 and the Nvidia driver is CUDA version 11.5. I updated both torch and torch vision in the container and that allows the object detection to run on the 3060 ti (some progress, yay!). Unfortunately, now I'm getting a new error with the face detection that I haven't been able to solve despite my best googling. The output of cat ../logs/stderr.txt is below: Traceback (most recent call last): File "/app/intelligencelayer/shared/face.py", line 307, in face det = detector.predict(img, 0.55) File "/app/intelligencelayer/shared/./process.py", line 61, in predict pred = self.model(img, augment=False)[0] File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/yolo.py", line 149, in forward return self.forward_once(x, profile) # single-scale inference, train File "/app/intelligencelayer/shared/./models/yolo.py", line 176, in forward_once x = m(x) # run File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/common.py", line 109, in forward 1, File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/app/intelligencelayer/shared/./models/common.py", line 32, in forward return self.act(self.bn(self.conv(x))) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 461, in forward return F.hardswish(input, self.inplace) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1178, in __getattr__ type(self).__name__, name)) AttributeError: 'Hardswish' object has no attribute 'inplace' Do you have any recommendation on how to proceed here? Many thanks! Further modifications are beyond what I am able to help with. You may have better luck posting or searching on the Deepstack forum. Sorry I can't be of much help. From what I can see it seems like the face recognition in Deepstack doesn't support the newer version of pytorch. Edited November 11, 2021 by ndetar Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.