I am trying to infer from a model in monai label using 3DSlicer but I am running out of GPU memory

I am using an AWS g5.12xlarge instance. This instance has 4 GPUs with a total of 96 GB of memory. Training works very well and all GPUs get utilised (checked using nvidia-smi, however for inference, I get an error.

Unfortunately I wasn’t able to save the stdout last time I tried running inference and it is getting quite expensive to try it out constantly, but the error I got is the same as below from this post (Monai label. CUDA out of memory) just with different memory values.

RuntimeError: CUDA out of memory. Tried to allocate 4.06 GiB (GPU 0; 24 GiB total capacity; 4.37 GiB already allocated; 2.42 GiB free;
4.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2022-10-29 21:07:39,128] [14748] [ThreadPoolExecutor-0_0] [INFO] (monailabel.utils.async_tasks.utils:77) - Return code: 1

The images i am training and inferring are on the order of 2gb compressed (in nifti.gz format) and i cannot downsample them as I require this level of detail in the images.

Does anyone have any suggestions for what might fix this issue?

1 Like

If you haven’t already, be sure to exit the training before running the inference to clear the memory.

@diazandr3s may have more suggestions on this.

1 Like

Hi @Carl_alv,

As @pieper suggested, please make sure training or other processes do not keep the GPUs busy before running inference.

For training, MONAI Label uses DistributedDataParallel (DDP).

For inference, you’ll need to make sure the image fits on a single GPU. Can you please comment more on the volume size and whether you are doing any image resampling?

Please let us know

Hi @diazandr3s,

Thanks for getting back to me and apologies for my late response. I have been on break.

The compressed volume is 2.72 gb in a nifti file (nii.gz). This image was upsampled by 2x from a 181.28 MB file. I realise this is quite big but I need this resolution for my work.

I was able to recreate the error message and it is attached below.

[2023-11-08 08:20:26,087] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:76) - PRE - Run Transform(s)
[2023-11-08 08:20:26,087] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:77) - PRE - Input Keys: ['largest_cc', 'device', 'model', 'image', 'result_extension', 'result_dtype', 'client_id', 'description', 'image_path']
[2023-11-08 08:27:09,827] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (LoadImaged): Time: 403.739; image: torch.Size([1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:12,588] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (EnsureTyped): Time: 2.7612; image: torch.Size([1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:12,589] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (EnsureChannelFirstd): Time: 0.0002; image: torch.Size([1, 1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:16,269] [4269] [MainThread] [ERROR] (uvicorn.error:369) - Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 366, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 269, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/exceptions.py", line 93, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/exceptions.py", line 82, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 670, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 266, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 227, in app
    raw_response = await run_endpoint_function(
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/endpoints/infer.py", line 179, in api_run_inference
    return run_inference(background_tasks, model, image, session_id, params, file, label, output)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/endpoints/infer.py", line 161, in run_inference
    result = instance.infer(request)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/interfaces/app.py", line 307, in infer
    result_file_name, result_json = task(request)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/tasks/infer/basic_infer.py", line 297, in __call__
    data = self.run_pre_transforms(data, pre_transforms)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/tasks/infer/basic_infer.py", line 388, in run_pre_transforms
    return run_transforms(data, transforms, log_prefix="PRE", use_compose=False)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/interfaces/utils/transform.py", line 106, in run_transforms
    data = t(data)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/dictionary.py", line 416, in __call__
    d[key] = self.spacing_transform(
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/deprecate_utils.py", line 221, in _wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 606, in __call__
    data_array = self.sp_resample(
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/deprecate_utils.py", line 221, in _wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 247, in __call__
    img = convert_to_tensor(data=img, track_meta=get_track_meta(), dtype=_dtype)
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/type_conversion.py", line 149, in convert_to_tensor
    return _convert_tensor(data).to(dtype=dtype, device=device, memory_format=torch.contiguous_format)
  File "/opt/conda/lib/python3.10/site-packages/monai/data/meta_tensor.py", line 268, in __torch_function__
    ret = super().__torch_function__(func, types, args, kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 1279, in __torch_function__
    ret = func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.48 GiB (GPU 0; 22.06 GiB total capacity; 10.74 GiB already allocated; 10.58 GiB free; 10.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have also attached the nvidia-smi run below to show you the GPU memory usage.

| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10G         On   | 00000000:00:1B.0 Off |                    0 |
|  0%   24C    P0    56W / 300W |  11755MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
|   1  NVIDIA A10G         On   | 00000000:00:1C.0 Off |                    0 |
|  0%   20C    P0    44W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
|   2  NVIDIA A10G         On   | 00000000:00:1D.0 Off |                    0 |
|  0%   21C    P0    41W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
|   3  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   21C    P0    40W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      4269      C   python                          11753MiB |
1 Like

Wow! :stuck_out_tongue:
image: torch.Size([1024, 1024, 2749])
This is certainly higher than anything I have worked with so far. I am glad that you got the training to run. What crop/ROI size does your model operate on? E.g. 128x128x128 or higher?
For inference, you may need to run sliding window inference and aggregate the patch-wise predictions into a large tensor that is hosted on the CPU (afaik, inference itself can run on the GPU). Please note that this may be considerably slower than processing fully on the GPU. If speed is not a major concern, I think that @diazandr3s can give you more detailed instructions on how to achieve that.

1 Like

Hi @Carl_alv,

This image is huge :open_mouth:

Which model are you using here? DeepEdit or the vanilla Segmentation model? Can you please share the command you use to start the MONAI Label server?

In any case, hosting ONLY this volume requires around 21GB of memory - either RAM or GPU memory (1024x1024x2749x8 = ~21GB).

To have an idea of the total memory you’ll need, multiply 21GB by the number of labels.

Both DeepEdit and the Segmentation model predict more than 5 labels, which means more than 100GB of memory is needed - again either RAM or GPU memory.

Hope this helps,

1 Like

Hi @diazandr3s ,
I am also working on this project with @Carl_alv .

- What model are you using here?
We are using the segmentation model, the one that comes in the radiology app.

- The command we are using to start the MONAI Label Server:
monailabel start_server --app radiology --studies <path-to-images> --conf models segmentation

We have only been resampling to 2x using the Slicer software because we wanted a higher-resolution label. We didn’t know how to increase the resolution of the label only, so we increased the resolution of the input image, which resulted in the resolution of the label improving as well.

If you have any guidance or advice on how we can go about increasing the resolution of the label that would be helpful.

Thank you in advance!

1 Like

Hi @ag_gan,

Thanks for the details.

The default Segmentation model predicts 25 regions: https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/configs/segmentation.py#L37-L62

Using this volume size you’ll need a huge memory.

My suggestion would be to make a prediction with the original volume size and then postprocess the predicted mask.

Maybe @lassoan could comment on how we can use 3DSlicer to smooth a mask?

1 Like

After you get your segmentation, you can increase the resolution of your segmentation and smooth the segments (using “Specify geometry”, as described here).