I am working on a project to predict the IDH status in glioma using features extracted via Pyradiomics. I’ve encountered a challenge with certain shape features that are extracted as character strings or tuples rather than single numeric values.
The documentation (Radiomic Features — pyradiomics v3.1.0rc2.post5+g6a761c4 documentation) describes these features as single scalar values, but the output I’m getting includes tuples and even hash strings. For example, the ‘original_shape_Elongation’ feature should represent the relationship between the two largest principal components, but I am seeing values like “(192, 256, 256)”.
Here are some examples of the features and their current formats:
original_shape_Elongation: “(192, 256, 256)”
original_shape_Maximum2DDiameterRow: “(0.9000000357627869, 0.8984375, 0.8984375)”
original_shape_Maximum3DDiameter: “(27, 103, 91, 57, 60, 47)”
original_shape_Sphericity: “(56.54839034712076, 136.01947201370345, 114.27520278099652)”
- And others with similar issues.
My goal is to reformat these character feature outputs into a consistent numeric format that I can use for predictive modeling. My initial thought is to extract meaningful single values from these tuples, possibly considering the maximum value, mean, or even recalculating the feature where possible.
How to interpret these tuples and character strings correctly for each feature.
For your reference, here are the settings I used for feature extraction (see attached settings image).
I appreciate any guidance or references to similar cases you might have encountered.