Python wrapping of plastimatch

Thanks @fedorov for the mention! (I actually remember looking at this thread while deciding what to do for the dummy wrapper - so hopefully this will be able to help someone else!)

@gcsharp what I’ve implemented on the fly (the plan is to expand it a bit, to make sure at least all the “common” operations can be launched from python scripts) is really simple and quite close to what Paolo did back in the days. The only reason I decided to implement “my own” is that I feared Paolo’s version could be outdated (with respect to Plastimatch). I also wanted to make sure a couple of features useful for the use cases were there, and forking Paolo’s implementation to then customise it didn’t look as fun :smiley:

Basically, it’s as simple as calling Plastimatch from python exploiting subprocess (any library for process management will do), and adding some other handy functionalities in the meantime (e.g., logging all the operations, and a few other scripts for visualisation etc.). This not only allows to run Plastimatch functions directly from your python scripts (of course, abstracting the OS - although I must report I have tested this exclusively on Linux), but also make use of other python functionalities that can help you process a great amount of data in less time. For instance, I have used the Plastimatch wrapper to run DICOM to NRRD conversion of hundreds/thousands of patients in parallel (with the performance/time requirements scaling almost linearly with the number of cores). In some cases, I’ve done so and then applied some pre-processing on the fly (for a total of a dozen lines of code), processing in a few minutes a dataset that would have required hours otherwise (e.g., DICOM to NRRD conversion scripting everything from bash, one subject at a time, then read the NRRD files, preprocess them exploiting python, and save them back).

The multiprocessing code I’m referring to here will be made available super soon (a matter of a couple of weeks if not less, I hope!) as part of the development of the use cases Andrey mentioned. In case you don’t want to wait though, here is an idea how you could script it using the dummy PyPlastimatch wrapper:

import os
import tqdm
import multiprocessing
import utils.pyplastimatch as pypla

# pat_config is a dictionary storing basic I/O and config info for the conversion 
def run_core(pat_config):

  # patient subfolder where all the preprocessed data will be stored
  if not os.path.exists(PAT_DIR): os.mkdir(PAT_DIR)
   
  verbose = pat_config["verbose"]
  
  # logfile for the plastimatch conversion
  LOGFILE = os.path.join(PAT_DIR, pat + '_pypla.log')
    
  # DICOM CT to NRRD conversion
  if not os.path.exists(CT_NRRD_PATH):
    convert_args_ct = {"input" : PATH_TO_DICOM_DIR,
                       "output-img" : PATH_TO_NRRD_CT}
    
    # clean old log file if it exist
    if os.path.exists(log_file_path_nrrd): os.remove(LOGFILE)
    
    pypla.convert(verbose = verbose, path_to_log_file = LOGFILE, **convert_args_ct)
def main(config):

  cpu_cores = config["cpu_cores"]

  if use_multiprocessing:
    pool = multiprocessing.Pool(processes = cpu_cores)

  """
  write here the code to populate "pat_config_list", a list of dictionaries storing
  the information used by the "run_core" function for the processing (e.g., paths, verbosity, etc.)
  """

  for _ in tqdm.tqdm(pool.imap_unordered(run_core, pat_config_list), total = len(pat_dict_list_mp)):
    pass
if __name__ == '__main__':

  """
  Parse the config file for the "main" function here!
  """

(sorry if this is not a MWE - if you want to test-run your own code based on this and have problems, do reach out! As I said, such scripts and all the details will be made available as part of the on-going effort at IDC!)

P.S.

That is the plan indeed (to have it documented and available through pip)!