I have a question about Monai Label. Maybe somebody has faced the similar problem.
I use deepedit from radiology app without any global code changes on my tower-server with Asus Nvidia GeForce RTX 3080 10GiB.
I try to start the inference with the batch size equal to 1.
Input tensor is 512x512 with 512 images.
But I’m stuck in the “CUDA out of memory” error.
See IMG_5625.jpeg in the link from Google drive: IMG_5625.jpeg - Google Drive
See output_cuda_out_of_memory.txt in the link from Google drive: output_cuda_out_of_memory.txt - Google Drive
By the way, if I will uncomment line #275 it will work on CPU successfully.
See IMG_5623.PNG in the link from Google drive: IMG_5623.PNG - Google Drive
a) Is it real to work with Monai Label on Asus Nvidia GeForce RTX 3080 10GiB?
b) Maybe it possible to make some changes in code somewhere that will help Monai to distribute the computation load im my GPU memory and do this job with it, instead of CPU?
interesting thread, huge images indeed! However, from the logs it seems that the image already gets downsampled to (1,128,128,128), but that might still be too large for backprop. I would try (96,96,96) first, and if that’s still too large, maybe (80,80,80), or worst case (64,64,64) should work (but at that point you’d probably notice considerable staircase artifacts in the prediction).
To work at a higher resolution, it is probably best to work with the segmentation model, as @diazandr3s recommended. In that case you can set your patch size to e.g. (96,96,96) (whatever fits into GPU VRAM), and play around with the target_spacing parameter (here) to make sure that you get a good compromise between resolution and FOV of the patches.
There are tweaks you can do, one of which I think is to reduce the precision of floating points so you double memory, but the reality is the Nvidia is knowingly keeping the geforce line of gpus with insufficient memory for them not to compete in this domain (ML). You will make yourself a favor if you can move to something like RTX A6000 which provides double memory at 48GB.
I do not know about distributed workload, but I suspect if your model doesn’t fit in the memory of one gpu, it won’t work. I dont think they “pool” the memory.