When I load DICOM segmentation data from the Slicer DICOM module I will sometimes be told that it has found referenced datasets, and with other SEG objects I am not.
Can someone explain what tags are being evaluated to determine this?
When I load DICOM segmentation data from the Slicer DICOM module I will sometimes be told that it has found referenced datasets, and with other SEG objects I am not.
Can someone explain what tags are being evaluated to determine this?
Hi @Justin_Kirby - These would come from the ReferencedSeriesSequence
, if the segmentation has one. Slicer is just offering to load this so you can look at both together as a background with segmentatio overlay. The code is here. I believe you only get the option dialog if the referenced series or instances are in the current slicer database.
Thanks Steve. We had someone provide DICOM SEG data created with dcmqi to TCIA recently. For some reason the relevant tags in their data were empty (perhaps @fedorov would know about this?). When I noticed the issue and asked if this could be populated they did so via a custom python script (see below). However, Slicer still doesn’t prompt to open the related MR series after filling this in and I do have both the original MR and SEG series in my DICOM database. If you see anything in their code that would explain this please let me know. And thanks for the link to the relevant code in Slicer. I’ll pass that along to our developers to see if we can implement some checks to ensure the relevant fields are populated in future data submissions.
from pydicom.dataset import Dataset
from pydicom.sequence import Sequence
import os
import pydicom
import pandas as pd
import os
import pydicom
import pandas as pd
path_remind = '/Users/reubendo/Documents/data/ReMIND TCIA/manifest-1695134609823/ReMIND'
cases = [case for case in os.listdir(path_remind) if os.path.isdir(os.path.join(path_remind,case))]
len(cases)
dictionn = {'Seg 0x0020, 0x000E':[], 'MR 0x08,0x16':[], 'MR 0x08,0x18':[], 'MR 0x0020, 0x000E':[]}
for case in sorted(cases):
path_case = os.path.join(path_remind, case)
studies = [k for k in os.listdir(path_case) if os.path.isdir(os.path.join(path_case,k))]
for study in studies:
path_study = os.path.join(path_case, study)
segmentations = [k for k in os.listdir(path_study) if 'seg' in k]
images = [k for k in os.listdir(path_study) if not 'seg' in k and os.path.isdir(os.path.join(path_study, k))]
for seg in segmentations:
dcm_seg = pydicom.dcmread(os.path.join(path_study, seg, '1-1.dcm'))
dcm_seg_framereference = dcm_seg[0x20,0x52].value
candidates = []
for i in images:
path_i = os.path.join(path_study, i)
dcms_i = [k for k in os.listdir(path_i) if 'dcm' in k]
dcm_i = pydicom.dcmread(os.path.join(path_study, i, dcms_i[0]))
if dcm_seg_framereference == dcm_i[0x20,0x52].value:
dictionn['Seg 0x0020, 0x000E'].append(dcm_seg[0x0020, 0x000E].value)
dictionn['MR 0x08,0x16'].append(dcm_i[0x08,0x16].value)
dictionn['MR 0x08,0x18'].append(dcm_i[0x08,0x18].value)
dictionn['MR 0x0020, 0x000E'].append(dcm_i[0x0020, 0x000E].value)
candidates.append(i)
ds11 = Dataset()
ds11.add_new((0x0008,0x1150), 'UI', dcm_i[0x08,0x16].value)
ds12 = Dataset()
ds12.add_new((0x0008,0x1155), 'UI', dcm_i[0x08,0x18].value)
ds1 = Sequence([ds11,ds12])
ds21 = Dataset()
ds21.add_new((0x0020, 0x000E), 'UI', dcm_i[0x0020, 0x000E].value)
ds22 = Dataset()
ds22.add_new((0x0008,0x114A), 'SQ', ds1)
ds2 = Sequence([ds22,ds21])
dcm_seg.add_new((0x0008,0x1115), 'SQ', ds2)
os.makedirs(f'./output/{case}/{study}/{seg}/', exist_ok=True)
pydicom.dcmwrite(f'./output/{case}/{study}/{seg}/1-1.dcm', dcm_seg)
assert len(candidates)==1
I see @fedorov is responding. I’ll just mention that since this is the REMIND data, I could work with Reuben and have a look at the datasets to troubleshoot if that’s easier than guessing.
In order for the references to be populated by dcmqi
the user should pass the pointer to the folder with the source DICOM series as an optional parameter to the converter. “We used dcmqi” is not enough, there should be review of how it was used, and what was the result to ensure they used it as recommended to enable features like this one.
I have to say I do not know DICOM well enough to remember what numeric tags mean, so it’s hard for me to check whether the result meets the expectations of the code referenced by @pieper above. I am always impressed by people who know those by heart!
I would take the resulting file and compare DICOM dump against the organization of a SEG file where loading of the referenced series works, and try to find differences. Maybe you can share the relevant part of the dump of the output file (ReferencedSeriesSequence
and ReferencedImageSequence
). It is also always good to run dciodvfy
on the output of any code that messes with the DICOM tags.
@pieper if you’re able to work with them to resolve this I think they’d be very appreciative. I know they are quite anxious to publish the data. If it is not a simple fix we could potentially do a “version 2” release later on with the revised data.
@fedorov thanks for the clarifications. Sounds like they missed the optional parameter.
Yes, I’ve reached out and should be able to help as needed. But the team is quite good so maybe it’ll get resolved by following the advice from @fedorov .
Hi,
Following up on this, the problem is related to the order of the ReferencedSeriesSequence:
Inspired by this convention
I considered that the SeriesInstanceUID was the second element of the sequence. It seems that it should be the first element in slicer.
If we want to stick to the first element, then the code just needs to be modify but changing
ds2 = Sequence([ds22,ds21])
into ds2 = Sequence([ds21,ds22])
.
This is great Reuben
yes, it’s a bit arbitrary at some level, but it’s not just a convention, it’s dicom, which is an ISO standard. So if we stick to that we can just refer to the standard and never have to document things like this. We don’t want to live in a world without global standards!
Yes, I would think so too, that the class UID should go first, and the Instance UID should go second in the sequence. Do we think that’s a bug in the Quantitative Reporting extension code?
I didn’t quite understand the details in the above, but I did look at a sample output from dcmqi
and confirmed that the order is what I would expect.
The snippets above are from a SEG generated using dcmqi
for the C4KC-KiTS collection. I looked at the segmentations for the subject KiTS-00001 study, which is here: https://viewer.imaging.datacommons.cancer.gov/viewer/1.3.6.1.4.1.14519.5.2.1.6919.4624.138299679445949029090789149621.
Ah, maybe we are getting down to this real issue.
Looking at the earlier version of the data, it looks like the structure of the files is a bit different. Rather than an item with two tags like @fedorov’s example and the innolitics figure, the REMIND file has two items each with one tag.
So now I"m not sure why changing the order of the sequence statement worked for @reubendo.
Yes, I did notice that, but since @reubendo said the problem is solved, decided not to raise it.
Clearly, that sequence is not populated the way it should be populated. It should list all of the instances of the series that were segmented - not just one (unless only one slice was segmented). And yes, it should list ReferencedSOPClassUID
and ReferencedSOPInstanceUID
as siblings within the sequence item.
As I mentioned earlier, it is always a good idea to run dciodvfy
if one manipulates DICOM metadata. I am pretty sure this issue will be flagged by the validator.
I am not sure what errors may exist in this latest version of the data, but I do know our team ran dciodvfy against the original version of the data generated by dcmqi and dciodvfy did not complain about the absence of this information. That’s why it wasn’t until I went to load the data into Slicer that we uncovered the issue. Could these fields be optional per the official DICOM rules?
On a related note, is that why the parameter in dcmqi that @fedorov mentioned earlier is optional instead of mandatory?
Should we be requesting a change to the DICOM standard and/or updating tools like dcmqi to require it? Or are there legitimate reasons why one should be able to create segmentation data without this reference info?