I have been working with Andres to train a MONAI label spine segmentation on a windows 10 pro workstation (Lenovo p610, 64 core CPU AMD Threadripper pro, Nvidia Tesla V100, Quadro 1000 GPU, 256Gb Ram). We have tried several times to run the model, and although it runs and trains, it appears to run on the CPU only. Running Nvidia SMI recognizes both GPUs. I have installed the server driver version of the Tesla card (V100). I wonder if anyone had the same issue/found a solution.
Thanks, Ron
In my experience, this is almost always due to incorrect CUDA installation and/or insufficient environmental variable settings. Nvidia-smi can only confirm that the Nvidia driver is installed correctly and that GPU is operational. CUDA set up is different.
First you have to confirm that:
- You are using a torch that is cuda enabled
- Torch is indeed using the GPU (incorrect environment setup can block this).
See this answer, and follow the steps and see if you are seeing your V100 being reported.
Dear Murat
Thanks for the suggestion, I will work with Andres on Monday and see if we can find and fix what is the issue. Will post our solution. Thanks, Ron
We were able to use gpus on docker enviroments using virtualgl with vglrun…
I have read that this methos was also used for AI.
So you need to set up a docker image test.
Maybe @lassoan has some experience woth this
Hope it helps
Hello @Ron,
a while ago I asked the same question in the MONAI Label discussions (Generic deepgrow -- High CPU usage and low GPU usage · Discussion #282 · Project-MONAI/MONAILabel · GitHub). My feeling was that some operators in some MONAI Label applications were GPU-based while others were only CPU-based. The answer that I got points to that direction. I hope this helps.
sohuld work on gpu—