I have a case where there are multiple ROIs representing multiple lesions from a brain metastasis patient. Running a voxel-based extraction on two of these ROIs independently provides different answers than when I extract features on a combined ROI of the two lesions, then separate it after. This is very disturbing.
This tells me that the voxel-based extraction is not localized, and will always depend on what’s going on in other voxels. I’d like to train a classifier based on the values from individual voxels, but the pipeline used to extract these voxels i.e. combined ROI or individual, makes the classifier invalid.
I’m currently attempting to simply extract features across the entire brain, from which I’ll collect from the appropriate ROIs later… but it seems like this is going to be prohibitively slow.
How can I extract feature maps without the values in one voxel being determined by distant values? Is the voxel-based extraction local? what does the kernel size mean if this test-case fails?
There are 2 reasons for the behaviour you’re describing, and both occur only when you use a masked kernel (values for a voxel are calculated on a region surrounding the voxel: the kernel. A masked kernel is is the intersection between the region surrounding the voxel, and the original mask: i.e. part of the mask that surrounds the voxel).
Reason 1: some voxels are exluded when they are not in the mask
Reason 2: image discretization is done on a global (image) level. When using a masked kernel, this is done on the masked region. If unmasked kernels are used, the entire image is discretized.
If you want to have similar results, try using unmasked kernels (parameter “maskedKernel” = False)
As to speed: Keep in mind that voxel-based radiomics is like doing segment-based radiomics, but for each voxel separately. So yes, in case of large regions, this will indeed be very slow. I tried to enhance performance by computing ‘batches’ of voxels (allowing more matrix calculations in python, reducing the number of python loops), and improving the iteration algorithm when calculating the texture matrices. While this certainly improves the performance, it still remains a fairly slow process.
Thank you for your quick response, Joost.
I suspected that maskedKernel would be involved and I was already playing around with it before sending a message since it didn’t seem to solve the problem. Consider the following output:
This data set has 7 contoured lesions, so in this analysis, I’m only considering the first lesion alone in a single segment vs the first lesion and the 7th lesion combined into a single segment. I ran the voxel-based feature extraction on both cases and then considered the value of a single pixel that originated from the FIRST segment. There were four runs:
- roi 1 alone, maskedKernel = True
- roi 1 + roi 7, maskedKernel = True
- roi 1 alone, maskedKernel = False
- roi 1 + roi 7, maskedKernel = False
As far as I can tell, the values depend on the segment fed into it, regardless of maskedKernel being true or false.
It would be ideal to first perform discretization globally across the unmasked image, and then compute features across local regions, using the pre-computed discretization bins.
Perhaps I’m misunderstanding. Wouldn’t you expect this test-case to return the same values for a pixel in region 1 IF maskedKernel = false, regardless of the segment size/shape i.e. regardless of whether a combined ROI is used vs a single ROI?
I think I might have tracked down the issue. In base.py, it appears the self.InputImage has already been clipped to the bounding box of the segment. The logic checks maskedKernel and will perform binning on the masked region if maskedKernel == True; otherwise, it will create a mask that is the size of the bounding box.
Perhaps a clumsy workaround will be to set the corner voxels in the mask to 1. I will try and see if this fixes my problem.
This workaround solved my problem. The feature values are now consistent whether or not a single or combined ROI was used.