I am trying to infer from a model in monai label using 3DSlicer but I am running out of GPU memory

Carl_alv · November 2, 2023, 1:14pm

I am using an AWS g5.12xlarge instance. This instance has 4 GPUs with a total of 96 GB of memory. Training works very well and all GPUs get utilised (checked using nvidia-smi, however for inference, I get an error.

Unfortunately I wasn’t able to save the stdout last time I tried running inference and it is getting quite expensive to try it out constantly, but the error I got is the same as below from this post (Monai label. CUDA out of memory) just with different memory values.

RuntimeError: CUDA out of memory. Tried to allocate 4.06 GiB (GPU 0; 24 GiB total capacity; 4.37 GiB already allocated; 2.42 GiB free;
4.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2022-10-29 21:07:39,128] [14748] [ThreadPoolExecutor-0_0] [INFO] (monailabel.utils.async_tasks.utils:77) - Return code: 1

The images i am training and inferring are on the order of 2gb compressed (in nifti.gz format) and i cannot downsample them as I require this level of detail in the images.

Does anyone have any suggestions for what might fix this issue?

pieper · November 2, 2023, 1:45pm

If you haven’t already, be sure to exit the training before running the inference to clear the memory.

@diazandr3s may have more suggestions on this.

diazandr3s · November 2, 2023, 2:03pm

Hi @Carl_alv,

As @pieper suggested, please make sure training or other processes do not keep the GPUs busy before running inference.

For training, MONAI Label uses DistributedDataParallel (DDP).

For inference, you’ll need to make sure the image fits on a single GPU. Can you please comment more on the volume size and whether you are doing any image resampling?

Please let us know

Carl_alv · November 8, 2023, 8:37am

Hi @diazandr3s,

Thanks for getting back to me and apologies for my late response. I have been on break.

The compressed volume is 2.72 gb in a nifti file (nii.gz). This image was upsampled by 2x from a 181.28 MB file. I realise this is quite big but I need this resolution for my work.

I was able to recreate the error message and it is attached below.

[2023-11-08 08:20:26,087] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:76) - PRE - Run Transform(s)
[2023-11-08 08:20:26,087] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:77) - PRE - Input Keys: ['largest_cc', 'device', 'model', 'image', 'result_extension', 'result_dtype', 'client_id', 'description', 'image_path']
[2023-11-08 08:27:09,827] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (LoadImaged): Time: 403.739; image: torch.Size([1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:12,588] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (EnsureTyped): Time: 2.7612; image: torch.Size([1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:12,589] [4269] [MainThread] [INFO] (monailabel.interfaces.utils.transform:122) - PRE - Transform (EnsureChannelFirstd): Time: 0.0002; image: torch.Size([1, 1024, 1024, 2749])(torch.float32)
[2023-11-08 08:27:16,269] [4269] [MainThread] [ERROR] (uvicorn.error:369) - Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 366, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 269, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/exceptions.py", line 93, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/exceptions.py", line 82, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 670, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 266, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 227, in app
    raw_response = await run_endpoint_function(
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/endpoints/infer.py", line 179, in api_run_inference
    return run_inference(background_tasks, model, image, session_id, params, file, label, output)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/endpoints/infer.py", line 161, in run_inference
    result = instance.infer(request)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/interfaces/app.py", line 307, in infer
    result_file_name, result_json = task(request)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/tasks/infer/basic_infer.py", line 297, in __call__
    data = self.run_pre_transforms(data, pre_transforms)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/tasks/infer/basic_infer.py", line 388, in run_pre_transforms
    return run_transforms(data, transforms, log_prefix="PRE", use_compose=False)
  File "/opt/conda/lib/python3.10/site-packages/monailabel/interfaces/utils/transform.py", line 106, in run_transforms
    data = t(data)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/dictionary.py", line 416, in __call__
    d[key] = self.spacing_transform(
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/deprecate_utils.py", line 221, in _wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 606, in __call__
    data_array = self.sp_resample(
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/deprecate_utils.py", line 221, in _wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 247, in __call__
    img = convert_to_tensor(data=img, track_meta=get_track_meta(), dtype=_dtype)
  File "/opt/conda/lib/python3.10/site-packages/monai/utils/type_conversion.py", line 149, in convert_to_tensor
    return _convert_tensor(data).to(dtype=dtype, device=device, memory_format=torch.contiguous_format)
  File "/opt/conda/lib/python3.10/site-packages/monai/data/meta_tensor.py", line 268, in __torch_function__
    ret = super().__torch_function__(func, types, args, kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 1279, in __torch_function__
    ret = func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 21.48 GiB (GPU 0; 22.06 GiB total capacity; 10.74 GiB already allocated; 10.58 GiB free; 10.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have also attached the nvidia-smi run below to show you the GPU memory usage.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1B.0 Off |                    0 |
|  0%   24C    P0    56W / 300W |  11755MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G         On   | 00000000:00:1C.0 Off |                    0 |
|  0%   20C    P0    44W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G         On   | 00000000:00:1D.0 Off |                    0 |
|  0%   21C    P0    41W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   21C    P0    40W / 300W |      2MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4269      C   python                          11753MiB |

mangotee · November 8, 2023, 1:15pm

Wow!
image: torch.Size([1024, 1024, 2749])
This is certainly higher than anything I have worked with so far. I am glad that you got the training to run. What crop/ROI size does your model operate on? E.g. 128x128x128 or higher?
For inference, you may need to run sliding window inference and aggregate the patch-wise predictions into a large tensor that is hosted on the CPU (afaik, inference itself can run on the GPU). Please note that this may be considerably slower than processing fully on the GPU. If speed is not a major concern, I think that @diazandr3s can give you more detailed instructions on how to achieve that.

diazandr3s · November 8, 2023, 8:50pm

Hi @Carl_alv,

This image is huge

Which model are you using here? DeepEdit or the vanilla Segmentation model? Can you please share the command you use to start the MONAI Label server?

In any case, hosting ONLY this volume requires around 21GB of memory - either RAM or GPU memory (1024x1024x2749x8 = ~21GB).

To have an idea of the total memory you’ll need, multiply 21GB by the number of labels.

Both DeepEdit and the Segmentation model predict more than 5 labels, which means more than 100GB of memory is needed - again either RAM or GPU memory.

Hope this helps,

ag_gan · November 9, 2023, 1:22am

Hi @diazandr3s ,
I am also working on this project with @Carl_alv .

- What model are you using here?
We are using the segmentation model, the one that comes in the radiology app.

- The command we are using to start the MONAI Label Server:
monailabel start_server --app radiology --studies <path-to-images> --conf models segmentation

We have only been resampling to 2x using the Slicer software because we wanted a higher-resolution label. We didn’t know how to increase the resolution of the label only, so we increased the resolution of the input image, which resulted in the resolution of the label improving as well.

If you have any guidance or advice on how we can go about increasing the resolution of the label that would be helpful.

Thank you in advance!

diazandr3s · November 9, 2023, 4:02pm

Hi @ag_gan,

Thanks for the details.

The default Segmentation model predicts 25 regions: https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/configs/segmentation.py#L37-L62

Using this volume size you’ll need a huge memory.

My suggestion would be to make a prediction with the original volume size and then postprocess the predicted mask.

Maybe @lassoan could comment on how we can use 3DSlicer to smooth a mask?

lassoan · November 9, 2023, 4:38pm

After you get your segmentation, you can increase the resolution of your segmentation and smooth the segments (using “Specify geometry”, as described here).

Topic		Replies	Views
Failed to run inference in MONAI Label Server Support	8	2178	December 6, 2021
Monai label. CUDA out of memory Support segmentation , python , monai , monailabel	6	1569	December 8, 2022
MonaiLabel CUDA out of memory with Bundle Support segmentation , monailabel	15	723	August 1, 2024
MonaiLabel on 8GB GPU Support monailabel	3	341	January 14, 2023
unable to install MONAI Label plugin from Slicer Extension Manager Support	7	457	October 6, 2021

I am trying to infer from a model in monai label using 3DSlicer but I am running out of GPU memory

Related topics