Hello Hector,
This is our official statement for such issues.
Two immediate temporary solutions:
- Physically remove all GPU cards, except one - single card configuration.
or
- Install old GPU driver. The version 419.67 is working for us.
Reason:
The issue is with the CUDA memory de-allocation function, that has stopped working properly with latest NVIDIA GPU drivers.
More specifically the function CUDAFreeHost() resulted with success code, but the memory was not de-allocated and therefore after some time, the GPU pinned memory was filled up and the SW ended up with the message “CUDA error : 2 : Out of memory”.
It seems that this bug causes problems also to another CUDA developers:
https://devtalk.nvidia.com/…/cudafreehost-not-clearing-all…/
We have contacted NVIDIA and we hope that there will be a new driver with fix very soon.
Meanwhile, we are also trying to fix the problem by some workaround in our code and we will try to release hotfix ASAP.