Wish list: parallelize ALPACA / MALPACA registration

Dear slicerMorph developers,

might it be possible to consider parallelizing the MALPACA registration process?

e.g. run the registration process for each template and the target mesh set in parallel.

    -> 4 templates, 60 target meshes
          --> 4 "tasks" in parallel:           
                            # template 1 & 60 targets
                            # template 2 & 60 targets
                            # template 3 & 60 targets
                            # template 4 & 60 targets

I guess that would speed up the alignment process.

Best,
Markus

All the computations in ALPACA are already multi-threaded so there is nothing to be gained from running tasks in parallel on the same computer. In fact if you do that outcome will be opposite of what you expect, each task will run longer due to resource competition.

Why do you seek task parallelization? It should be already really fast, on my laptop I can get a registration of a mouse skull in about 50 seconds, faster if I use the my desktop with lots more core.

Usually it is the deformable registration step that takes longest chunk in ALPACA. If you are on windows you can use the Bayesian CPD, which gives about 10X speed up and is available at:

Unfortunately it requires building from scratch for MacOS and Linux, so we don’t distribute it with ALPACA, but it can be enabled under the Advanced options (check the acceleration and point out to the folder where bcpd lives).

Hi,

Why do you seek task parallelization? It should be already really fast, on my laptop I can get a registration of a mouse skull in about 50 seconds, faster if I use the my desktop with lots more core.

  • the motivation is to speed up the registration process and to make use of the CPU cores / resources.
  • during the optimiztion a single core is utilized and the remaining cores seem to be idle.

I have to mention that the skull registration is done on high resolution microCT data, 36 micron, and no
reduction of the surface mesh is applied. Thus, for a single skull, using 5000 surface points, registration takes roughly 2 - 3 minutes.

Of course, one can debate what fast/slow means.

Anyway, I was not aware that the algorithm is already parallelized.
And for sure I will take a look at the Bayesian CPD algorithm too.

Thanks.

Best,
Markus

When you are in the 4000-6000 point range, for a desktop with a recent CPU with 8 cores, the speed should be about 1m / sample, but that’s of course entirely dependent on the number of cores, their speed and their cpu generation.

Yes, do try the BCPD if you are going to run lots of these. Improvement should be quite dramatic.

Hi,

thanks for the hint! The BCPD algorithm dramatically boosts the computation;
the registration of ~ 5000 points takes now < 30 sec on a Xeon E3-1240 (3.5GHz, 4 cores).

Best,
Markus

1 Like