I am interested to know about the result validation strategy followed by libraries like pyradiomics.
While looking through the unit test cases, I found the results are validated against golden data. How was this golden data created and how was its correctness ensured?
The baseline data in PyRadiomics was first created using the same software tool used to extract features in the publication by Aerts et al (Nat. Com. 2014). After this, changes were only made to the baseline when this was due to intended changes in the code that were expected to yield different results.
As such, it mainly serves to act as a check that changes in the PyRadiomics code do not cause unintentional changes in the values of extracted features.
Besides this, PyRadiomics adheres to the standard described by IBSI in most cases, with documentation detailing how and why some differences exist.