MonaiLabel CUDA out of memory with Bundle

Hello everyone,

I am working with Slicer to do machine learning using MonaiLabel, the annotation and learning part work perfectly, it’s really nice tool !
Unfortunately I have troubles when I run the automatic segmentation with medium and big CT scan (starting from around 256256256 pixels I would say).
I got an error message :
“torch.cuda. OutfMemoryError: CUDA out of memory. Tried to allocate 7.28 GiB (GPU 0; 23.65 GiB total capacity; 12.28 GiB already allocated; 2.67 GiB free; 19.48 GiB reserved in total by PyTorch”

I have a NVIDIA GeForce RTX 4090: 24GB memory, and I am using MonaiLabel -Zoo/Bundle (0.7.0rc6), note that I have tried other versions of Bundles with same result but if I use Radiology application it works well.

Since the training is working perfectly and since I have 24GB, I would have thought that it would be ok for CT scan of medium sizes.

Do you have any ideas if I do something wrong about the memory ?

Thank you for your help
Florian COTTE

Hi Florian,
Have you checked if you have the recent version of Cuda (or at least the cuda version you have supports RTX 4090)?

It is better to check it here


Hi ylcnkzy,

I have cuda version 11.8 and from what I have found on internet it seems that it supports well RTX4090.
I can try to see if it’s a problem with the driver, I’ll look into that.


Can @diazandr3s weigh in? Would also be curious about the boundaries.

1 Like

Thanks for the ping @rbumm.

@FloCo which model are you using to run inference?

Here is the GPU memory usage for the two available models (1.5mm and 3mm): model-zoo/models/wholeBody_ct_segmentation at dev · Project-MONAI/model-zoo · GitHub

Thank you for your response @diazandr3s , indeed I didn’t see this page on memory usage.
I looked into totalsegmentator ressources and it was much less, it’s not the same model ?

I am using the model with 1.5mm so everything is normal then.

I have 2 GPUs 4090 of 24GB, is there a way a using both for inference and doing like one of 48GB ?

Thank you

No, it is not the same model, but wholeBody_ct_segmentation was trained using TotalSegmentator data.

Oh I see ! My mistake then .
And about the fact that I have two GPUs ? I guess I can’t “merge” them but I want to be sure I am not missing something simple to do ?

Thank you

It seems to be possible to use two GPUs, although I never tried this, see here. The question is whether it has any impact on the CUDA Out of memory error. You could probably add a MONAILabel issue on GitHub or comment in the thread above.

There is complexity here, and I am not expert, but in my experience, pytorch DNN models can be configured to use two GPUs instead of one. I assume many MONAI models take advantage of this. Use of two GPUs generally divides memory requirements in half. In the link Rudolf posted, it shows how to set the available devices to include 0 and 1. This is necessary. It might be the configuration of MONAILabel does this for us. I could test on a two GPU server in two weeks if we don’t have an answer earlier.

1 Like

After thinking about this. Using two GPUs halves memory only if the batch size for training or inferencing is greater than 1. From the MONAILabel GitHub page, it looks like the 1.5mm totalsegmentator model requires more than 24G memory for some volumes. You
Might split your volume into two sets of slices and separately segment each half then combine the segmentations. You would want to examine the boundary to look for discontinuities.

Combining the memory of two GPUs into one has been done by NVIDIA but requires custom work. Hope this helps.


I was thinking about splitting dicom images in two but I am not very confident about the continuity, I can try this and see how it goes.

About merging merging memory of GPUs, it sounds that it would solve the problem on my case, I am not an expert so it might be too difficult for me but I’ll follow this idea too thank you !