I am working with Slicer to do machine learning using MonaiLabel, the annotation and learning part work perfectly, it’s really nice tool !
Unfortunately I have troubles when I run the automatic segmentation with medium and big CT scan (starting from around 256256256 pixels I would say).
I got an error message :
“torch.cuda. OutfMemoryError: CUDA out of memory. Tried to allocate 7.28 GiB (GPU 0; 23.65 GiB total capacity; 12.28 GiB already allocated; 2.67 GiB free; 19.48 GiB reserved in total by PyTorch”
I have a NVIDIA GeForce RTX 4090: 24GB memory, and I am using MonaiLabel -Zoo/Bundle (0.7.0rc6), note that I have tried other versions of Bundles with same result but if I use Radiology application it works well.
Since the training is working perfectly and since I have 24GB, I would have thought that it would be ok for CT scan of medium sizes.
Do you have any ideas if I do something wrong about the memory ?
I have cuda version 11.8 and from what I have found on internet it seems that it supports well RTX4090.
I can try to see if it’s a problem with the driver, I’ll look into that.
Thank you for your response @diazandr3s , indeed I didn’t see this page on memory usage.
I looked into totalsegmentator ressources and it was much less, it’s not the same model ?
I am using the model with 1.5mm so everything is normal then.
I have 2 GPUs 4090 of 24GB, is there a way a using both for inference and doing like one of 48GB ?
Oh I see ! My mistake then .
And about the fact that I have two GPUs ? I guess I can’t “merge” them but I want to be sure I am not missing something simple to do ?
It seems to be possible to use two GPUs, although I never tried this, see here. The question is whether it has any impact on the CUDA Out of memory error. You could probably add a MONAILabel issue on GitHub or comment in the thread above.
There is complexity here, and I am not expert, but in my experience, pytorch DNN models can be configured to use two GPUs instead of one. I assume many MONAI models take advantage of this. Use of two GPUs generally divides memory requirements in half. In the link Rudolf posted, it shows how to set the available devices to include 0 and 1. This is necessary. It might be the configuration of MONAILabel does this for us. I could test on a two GPU server in two weeks if we don’t have an answer earlier.
After thinking about this. Using two GPUs halves memory only if the batch size for training or inferencing is greater than 1. From the MONAILabel GitHub page, it looks like the 1.5mm totalsegmentator model requires more than 24G memory for some volumes. You
Might split your volume into two sets of slices and separately segment each half then combine the segmentations. You would want to examine the boundary to look for discontinuities.
Combining the memory of two GPUs into one has been done by NVIDIA but requires custom work. Hope this helps.
I was thinking about splitting dicom images in two but I am not very confident about the continuity, I can try this and see how it goes.
About merging merging memory of GPUs, it sounds that it would solve the problem on my case, I am not an expert so it might be too difficult for me but I’ll follow this idea too thank you !
Can multi GPU inference be used on Windows or Linux systems? Currently, due to data size issues, it is evident that monailabel infers data in memory (i.e. CPU) without utilizing CUDA acceleration.
If I have multiple 4090 cards on a computer, how can I involve them in parallel computing for inference?
Have you solved the problem of merging the memory of two graphics cards mentioned above?
It has been a while since I used MONAILabel, but if it uses CPU only and not GPU on your system, it might be that you have the cpu version of torch installed instead of the GPU equipped version of torch. Or the Nvidia drivers are not installed correctly.
Whether a model uses multiple GPUs is generally controlled by how the inferencing python code is written, so you would need to look at the way the inferencing is performed inside the MONAILabel server to see if it is written for multi GPU. In my limited experience, it is often possible to change the code of a PyTorch model to inference across multi GPUs on a single system.
Multi-GPU works on Linux systems when the Nvidia drivers and torch-gpu versions are installed correctly. I have not personally tested multi-GPU on windows, but I assume it works as well.
Finally, deep learning, especially on 3D volumes, can be very memory intensive, whether it is run on CPU or GPU. Even with a GPU, the system memory must still be large enough to hold the data being written or read from the GPU. With small memory, it may help to down sample the volume or segment several smaller ROIs and then combine the segmentations back together later.
Hello, thank you for your reply. I use a Linux system. I’m using Nvidia drivers and a torch-gpu. I want to use multi-GPU inference. I saw that you mentioned that multiple Gpus work in Linux, how to modify the implementation of thank you for your reply