I’d like to ask here which are community suggestions about selecting an AI model and setting up training data for a medical landmark detection task.
In my use case, I need to detect the superior tip of healthy kidneys on CBCTs. I have a training dataset with around 150 CBCTs with the corresponding landmarks (CBCT on .nrrd files and points in .mrk.json files). I think my training data (i.e. sup kidney tips) could have till a 1cm error. Current training shows 70% accuracy and I don’t expect it to get higher. So I assume the model won’t work after training but that will have to wait a few days (till the training ends)
More information: same use case with same AI model but targeting CTs instead of CBCTs for training, even with halve of training data (e.g. around 70 CTs), had 95% accuracy during training and my tests show it working
could you please elaborate what is in your case accuracy? I am curious because from reading your description you have to use distance metrics instead of accuracy.
Kidney detection on CBCT is a hard problem, while on CT it is an easy problem. First of all, the field of view of a CBCT is much smaller (just 20-30cm, so you don’t always see the whole kidney and surroundings), CBCT soft tissue contrast is much worse (mainly due to physics - scattering of the cone beam), and CBCT images are less standardized (voxel values may not correspond accurately to HU). So, the 70% vs 95% accuracy is expected. You may need much more data to get much better results.
What model do you use now?
You can try nnLandmark, an nnU-Net based landmark detection model. Unfortunately, it is not packaged cleanly (they forked and modified nnU-Net), but it should worth a try.
Yes, I realized this while exploring my datasets and the early results I had.
Yes, I expect the training loss to be reduced logarithmically by doubling the training data by the experience I have.
I’m using a RL model.
Yes, I have been researching other models (as well as hyperparameters optimization for the current model I’m using) for this task and I did find nnLandmark suggested.
In my experiments, I found the training for landmark detection being quite sensitive to positional differences. You may want to check whether CBCTs are more variable compared to CT, and if they are try normalizing the difference by registering to a standard orientation (and of course update your landmark coordinates accordingly) and then redo the training. Additionally some CBCTs i have seen are not normalized for intensities (ie., in 16 bit values exceeding HU values). You might also check whether your pipeline is doing standardization of intensities.
I’ve tried that. I have tried to be rigorous in the creation and review of the training data pairs (i.e. images and landmarks).
I’ve found that around 5% of the original CBCTs in my dataset are out of distribution (i.e. outliers) and I have just skipped them during training as I don’t expect to be doing inference on such bad quality CBCTs.
Yes, as far as I remember that is being taken cared of