I was having a similar problem running out of GPU memory. I created a new conda environment and explicitly installed the CPU version of pytorch (using their install wizard: Start Locally | PyTorch). Running monailabel server in this environment bypassed the GPU. Inference runs multithreaded is still pretty quick.
I was trying to keep it very simple by using the Docker monai-label server because I just want to test one CT currently.
I guess I’ll have to take more time than I initially planned to deploy anaconda, pytorch and monai-label
It’s really pretty straightforward to install miniconda and then install torch and monai label. It really shouldn’t take more than a few minutes. Let us know how it goes.