Hello here is update to the following error.
Symptoms:CUDA errors during processing
Two immediate temporary solutions:
- Physically remove all GPU cards, except one - single card configuration.
or
- Install old GPU driver. The version 419.67 is working for us.
Reason:
The issue is with the CUDA memory de-allocation function, that has stopped working properly with latest NVIDIA GPU drivers.
More specifically the function CUDAFreeHost() resulted with success code, but the memory was not de-allocated and therefore after some time, the GPU pinned memory was filled up and the SW ended up with the message “CUDA error : 2 : Out of memory”.
It seems that this bug causes problems also to another CUDA developers:
https://devtalk.nvidia.com/…/cudafreehost-not-clearing-all…/
We have contacted NVIDIA and we hope that there will be a new driver with fix very soon.
Meanwhile, we are also trying to fix the problem by some workaround in our code and we will try to release hotfix ASAP.
Thanks to all, who has provided the details of the error. It has helped us to successfully re-produced the bug.