Random number generator and reproducibility issues

muratmaga · October 5, 2021, 4:17am

We running into an issue where, when we run the same dataset couple times back to back, we get slightly different results. These are subtle differences, but when combined, they are enough to create extra variation and impact the results.

So how does one go about reproducibility in SLicer? Normally we would do things like specifying a specific seed for RNG, but if we do that does that impact operations in Slicer, or only within our module? Is there some examples we can take a look at ?

@chz31 @smrolfe

lassoan · October 5, 2021, 4:40pm

This is not application-level feature, but it is up to the algorithm to expose an interface for specifying a random seed. Algorithms should not use any global shared random number generator but an object that the algorithm owns. In VTK you would use an object of this class, in ITK you would use this class, etc.

Of course, most algorithms still won’t produce exactly the same results on different computers (different CPU, C runtime, etc. give different results for the same floating-point operations) and for each run (due to multi-threaded implementation). In theory, you could achieve 100% reproducibility of the results, but since it requires turning off most optimizations, run single-threaded, and building everything from source on all platforms, this requires a lot of extra effort. What is even worse is that the resulting algorithm implementation would not suitable for end users, as it would be just so much slower than the approximate implementation.

I think putting a lot of effort into 100% reproducibility of a tiny part of a workflow (running some processing algorithm on some data) is actually harmful, because it takes the time away from the real goal: reproducibility of the entire workflow. The entire workflow includes imaging, specifying additional user inputs, processing, visualization of results, etc. This requires open-source code, open data, automatic testing, documentation, training, tutorials, etc., which may not sound as exciting but essential for the overall advancement of a field.

muratmaga · October 5, 2021, 4:56pm

Just clarify, if I do
np.random.seed(1)
will that affect all Slicer session on my specific module?

jcfr · October 5, 2021, 5:55pm

Ideally, you should not used the legacy interface to seed random numbers.

    def seed(self, seed=None):
        """
        seed(self, seed=None)
        Reseed a legacy MT19937 BitGenerator
        Notes
        -----
        This is a convenience, legacy function.
        The best practice is to **not** reseed a BitGenerator, rather to
        recreate a new one. This method is here for legacy reasons.
        This example demonstrates best practice.

        >>> from numpy.random import MT19937
        >>> from numpy.random import RandomState, SeedSequence
        >>> rs = RandomState(MT19937(SeedSequence(123456789)))
        # Later, you want to restart the stream
        >>> rs = RandomState(MT19937(SeedSequence(987654321)))
        """
[...]

References:

numpy/mtrand.pyx at v1.19.5 · numpy/numpy · GitHub

jcfr · October 5, 2021, 6:01pm

Also worth noting that seeding the generator using numpy.random.seed impacts other modules using the legacy interface.

So it should not be done in module logic and only reserved for testing.

For some more background, see python - Why using numpy.random.seed is not a good practice? - Stack Overflow

Topic		Replies	Views
What is the Random Seed Represent? Support dicom	0	14	September 9, 2024
Grow from seeds stopped working..... Support	3	220	June 9, 2023
Fuzzy Locally Adaptive Bayesian in 3D Slicer Support segmentation , pet	1	426	November 2, 2020
Can I use multiprocessing in a Python module Development	16	3475	December 13, 2024
Using Slicer and Slicer Modules from Command-Line Support	31	9343	August 14, 2020

Random number generator and reproducibility issues

Related topics