Although there are no DICOM files in the saved scene but an NRRD file as scalar volume, DICOM tags are still in the *.mrml file, with PatientName and PatientBirthDate in particular.
If the patient is removed from the DICOM database, the result is the same.
The problem: when an MRB file is shared, there is patient data leakage. Actually, that’s how I came across this finding, with the last shared MRB file on the Discourse forum. For this reason, I won’t link that post here but can do it in private messaging on request.
The scene file should probably not store values of DICOM tags, at least not containing the above mentioned ones. There’s nothing DICOM related in a saved scene.
Of course, a user may fully anonymise his series before sharing. It just gets cumbersome and most users won’t be doing that.
Since the MRML scene is extensible, we can’t assure that it is free of PHI, but I agree that this is a case where it would be reasonable to assume that no PHI is included.
@cpinter it looks like this can in via the subject hierarchy. Do you have ideas about how this could be automatically removed in this kind of scenario?
For many workflows it is essential to be able to go back to the DICOM database - for fetching additional metadata or be able to export results to DICOM. In research workflows, usually the data of the patient cohort is anonymized before you load any data into Slicer. In clinical workflows, usually all patient information preserved in the data set to make sure that you work on the correct patient and to keep the connection to the patient records.
There is a niche but important use case: you get data of an individual patient and then you want to share with people who are not authorized to see PHI.
Slicer supports a very simple, blunt tool for this: save the scene and then load the NRRD file (or other data files). This cuts all ties to the original DICOM, so it is hard to find you way back to the original data if you need it. PHI may still remain in the data via burnt-in annotations (e.g., patient name in the corner of an image in a secondary capture), recognizable face in a 3D scan, etc.
There are several DICOM anonymization tools that can do deidentification in a much more sophisticated way (cleans out PHI more reliably while preserving more data and offering controlled way to go map the data to the original source), which can be used outside Slicer.
I expect that in the not too distant future (within 1-2 years) there will be Slicer extensions for more convenient DICOM data deidentification, because machine learning is very data-hungry and Slicer will need to offer tools for making training data set building more convenient. These tools should work not just on cohorts but on individual patient data sets, too. They would not start from a Slicer scene (because it is not possible to anonymize data once it is read into the scene and accessed by arbitrary modules), but would work by processing the data before is loaded into the Slicer scene.
DICOM loading plugins require the data to be in the DICOM database. Therefore, it is not possible to anonymize during DICOM loading without major rework or the DICOM plugins or switching to a temporary database.
It would be much simpler and cleaner to anonymize during import (could be activated in the “Import DICOM files” window) and/or during export (could be activated in “Export to files” window). It is also necessary to be able to do batch processing - anonymizing all data in a folder. All these require integration of an open-source DICOM anonymizer tool with a simple GUI for configuring and running the processing.
Anonymize on import also makes sense. But what I took @chir.set to be suggesting would be to give the plugins the option of not putting PHI into the mrml scene when loading. Often it’s hard for a generic deidentifier to know what the tags mean, while the plugins are specific to the type of data and probably have a better chance of knowing which contain PHI and generating deidentified variants.