MonaiLabel CUDA out of memory with Bundle

Hello everyone,

I am working with Slicer to do machine learning using MonaiLabel, the annotation and learning part work perfectly, it’s really nice tool !
Unfortunately I have troubles when I run the automatic segmentation with medium and big CT scan (starting from around 256256256 pixels I would say).
I got an error message :
“torch.cuda. OutfMemoryError: CUDA out of memory. Tried to allocate 7.28 GiB (GPU 0; 23.65 GiB total capacity; 12.28 GiB already allocated; 2.67 GiB free; 19.48 GiB reserved in total by PyTorch”

I have a NVIDIA GeForce RTX 4090: 24GB memory, and I am using MonaiLabel -Zoo/Bundle (0.7.0rc6), note that I have tried other versions of Bundles with same result but if I use Radiology application it works well.

Since the training is working perfectly and since I have 24GB, I would have thought that it would be ok for CT scan of medium sizes.

Do you have any ideas if I do something wrong about the memory ?

Thank you for your help
Florian COTTE

Hi Florian,
Have you checked if you have the recent version of Cuda (or at least the cuda version you have supports RTX 4090)?

It is better to check it here

Best

Hi ylcnkzy,

I have cuda version 11.8 and from what I have found on internet it seems that it supports well RTX4090.
I can try to see if it’s a problem with the driver, I’ll look into that.

Best,

Can @diazandr3s weigh in? Would also be curious about the boundaries.

1 Like

Thanks for the ping @rbumm.

@FloCo which model are you using to run inference?

Here is the GPU memory usage for the two available models (1.5mm and 3mm): model-zoo/models/wholeBody_ct_segmentation at dev · Project-MONAI/model-zoo · GitHub

Thank you for your response @diazandr3s , indeed I didn’t see this page on memory usage.
I looked into totalsegmentator ressources and it was much less, it’s not the same model ?

I am using the model with 1.5mm so everything is normal then.

I have 2 GPUs 4090 of 24GB, is there a way a using both for inference and doing like one of 48GB ?

Thank you

No, it is not the same model, but wholeBody_ct_segmentation was trained using TotalSegmentator data.

Oh I see ! My mistake then .
And about the fact that I have two GPUs ? I guess I can’t “merge” them but I want to be sure I am not missing something simple to do ?

Thank you

It seems to be possible to use two GPUs, although I never tried this, see here. The question is whether it has any impact on the CUDA Out of memory error. You could probably add a MONAILabel issue on GitHub or comment in the thread above.

There is complexity here, and I am not expert, but in my experience, pytorch DNN models can be configured to use two GPUs instead of one. I assume many MONAI models take advantage of this. Use of two GPUs generally divides memory requirements in half. In the link Rudolf posted, it shows how to set the available devices to include 0 and 1. This is necessary. It might be the configuration of MONAILabel does this for us. I could test on a two GPU server in two weeks if we don’t have an answer earlier.

1 Like

After thinking about this. Using two GPUs halves memory only if the batch size for training or inferencing is greater than 1. From the MONAILabel GitHub page, it looks like the 1.5mm totalsegmentator model requires more than 24G memory for some volumes. You
Might split your volume into two sets of slices and separately segment each half then combine the segmentations. You would want to examine the boundary to look for discontinuities.

Combining the memory of two GPUs into one has been done by NVIDIA but requires custom work. Hope this helps.

2 Likes

I was thinking about splitting dicom images in two but I am not very confident about the continuity, I can try this and see how it goes.

About merging merging memory of GPUs, it sounds that it would solve the problem on my case, I am not an expert so it might be too difficult for me but I’ll follow this idea too thank you !

Can multi GPU inference be used on Windows or Linux systems? Currently, due to data size issues, it is evident that monailabel infers data in memory (i.e. CPU) without utilizing CUDA acceleration.

If I have multiple 4090 cards on a computer, how can I involve them in parallel computing for inference?

Have you solved the problem of merging the memory of two graphics cards mentioned above?

Hello, have you solved this GPUs problem?
I am also unable to GPUs inference Thanks for your help

It has been a while since I used MONAILabel, but if it uses CPU only and not GPU on your system, it might be that you have the cpu version of torch installed instead of the GPU equipped version of torch. Or the Nvidia drivers are not installed correctly.

Whether a model uses multiple GPUs is generally controlled by how the inferencing python code is written, so you would need to look at the way the inferencing is performed inside the MONAILabel server to see if it is written for multi GPU. In my limited experience, it is often possible to change the code of a PyTorch model to inference across multi GPUs on a single system.

Multi-GPU works on Linux systems when the Nvidia drivers and torch-gpu versions are installed correctly. I have not personally tested multi-GPU on windows, but I assume it works as well.

Finally, deep learning, especially on 3D volumes, can be very memory intensive, whether it is run on CPU or GPU. Even with a GPU, the system memory must still be large enough to hold the data being written or read from the GPU. With small memory, it may help to down sample the volume or segment several smaller ROIs and then combine the segmentations back together later.

1 Like

Hello, thank you for your reply. I use a Linux system. I’m using Nvidia drivers and a torch-gpu. I want to use multi-GPU inference. I saw that you mentioned that multiple Gpus work in Linux, how to modify the implementation of thank you for your reply