Request for suggestions for image informatics tools

muratmaga · December 20, 2018, 5:35pm

Hi,

We have various datasets (both clinical and non-clinical) we would like to consolidate into some sort of a image informatics solution that will help us keep them better organized and annotated. I am looking into Xnat as an option but would like know if there are other solutions better integrated with Slicer. Being able to retrieve data from repository into Slicer is a plus.

From clinical data, we might have anonymized DICOM series from a ‘normal’ patient as a nifti/nrrd, a cleaned up version where only cranial bones are retained, segmentations (e.g., mandible and cranium as separate structures), and fiducials and so forth.

There are different projects with different datasets, but in similar vain…

It will be great to hear your suggestions…

pieper · December 22, 2018, 12:17am

Hi @muratmaga -

I guess you saw the XNATSlicer extension which does a lot of the things on your wishlist. Last I saw it might need some updates for the latest versions of Slicer and I don’t know if it has been used widely (might have limitations/bugs).

Storing the derived data as DICOM (instead of nifti/nrrd) could be a good option. This has the advantage of keeping all the relationships explicit in the data files instead of storing them in a separate detached database (e.g. the definitive source of information about what images were annotated by whom and when is in the header). Then you can use different database options to organize the data as needed, like a dicomweb server with an extra database like in Crowds Cure Cancer. Admittedly the Slicer infrastructure for all of this is still a work-in-progress, but the pattern is well defined.

lassoan · December 22, 2018, 3:05am

DICOM file format

It is highly structured and standardized, so it is great for data archival or sharing. However, we usually choose more generic representations (tables in csv files, images as nrrd files, etc.) as internal representation, as they are more flexible, efficient, simpler, and compatible with much more software. @fedorov has made nice progress in making DICOM directly usable for certain data analysis tasks, maybe he can add some more comments and give pointers to examples.

XNAT

It is still actively maintained, mostly keeps up with state-of-the art technologies (web APIs, docker, etc) and several groups use it. It is customizable by plugins, but as I’ve heard that it is not easy to create them (this seems to be confirmed by the low number of plugins - less than 30, although the project has been around for more than a decade).

Girder

Girder is Kitware’s fresh take on research data management and analysis (based on their experience with their previous-generation MIDAS data management system). Compared to XNAT, Girder is built on more modern basis and there seem to many more developers on the project. It has nice integration with data analysis and visualization features (Resonant platform).

Our usual workflow

In the past, we’ve tried to set up XNAT (when it was not yet dockerized) but we did not succeed. We have tried using CouchDB, which worked well for storing small data sets (descriptive data), but was not usable (synchronization was very unreliable) for large files, such as 3D images. We have now a Girder instance set up, but so far we only used it for simply sharing files with external collaborators.

What we usually do is storing Slicer scenes (mrb files) in folders on a shared drive. Mapping from internal IDs to patient information is in a password protected spreadsheet or REDCap database. The saved scene also contains additional annotation (landmarks, manual segmentation, etc.) and may contain computation/analysis results.

For batch processing and analysis we put the selected group of mrb files in a folder and iterate through them using a Python script. In some projects, the data processing is split into a generic data extraction step (done only once, it generates summary csv files, series of aligned/normalized images, meshes, etc.), and a processing/analysis step (which is very specific to the data set and research question).

Nowadays we mostly run Python scripts using Jupyter notebooks, as they are easier to run/modify/verify than using a plain command-line interface. Also, notebooks work the same way regardless of operating system and they can also run on high-performance computing clusters (using JupyterHub).

muratmaga · December 22, 2018, 6:28am

Thank you both @lassoan and @pieper.

I wish there is some sort of format that will have the metadata capability of DICOM, and the flexibility and ease of use of MRB (so everything travels together).

pieper · December 22, 2018, 8:22pm

DICOM and MRB are really two ends of the spectrum with respect to standardization vs customization. We can feel free to innovate in MRB but it’s very specific to Slicer while DICOM has compromises but at least the hope of data exchange. In the end both have their uses.

muratmaga · December 22, 2018, 10:44pm

It is a shame that medical imaging doesn’t use a container like hdf5 where dicom tags would contain metadata and you can keep adding image volumes and everything derived from it.

lassoan · December 22, 2018, 11:28pm

DICOM specifies a much more sophisticated container than HDF5. The DICOM file container can store any information.

One issue is that the container is probably more complex than it would be necessary. There are many data types that can be stored in many different ways.

Also, DICOM is much more than a file format. Data structure and encoding is just one chapter of the standard. There are 20 other chapters describing various other aspects of data storage. For example, Part 3 contains high-level data structures, defined above the file container level - in more than 1600 pages.

If you used DICOM just as a container, as HDF5, and stores all data in private fields (that you have specified for yourself) then you could make it as simple and efficient as you want. Many imaging device developers follow this approach. However, then you would lose most benefits of DICOM.

pieper · December 22, 2018, 11:34pm

That’s effectively what a DICOM Study is: a place to collect logically related data - you can add additional series and instances that reference or derive from other data. You can also link across studies. It’s just mediated by a server, aka PACS, (either over DIMSE or DICOMweb), which typically has a database indexed over the most common query keys.

It’s also possible to put all the DICOM instance data you want into a directory indexed by a DICOMDIR file. The directory could be encoded as an hdf5 file, although I don’t think I’ve ever seen that done.

And it’s not just that DICOM structures are flexible, it’s that many of the most important fields have well defined representations. Like human names that can be in different language conventions and quantitive measurements with standardized ways of describing what they measure and what units they are expressed in. If you don’t use DICOM then you need to re-invent all these things.

muratmaga · December 23, 2018, 5:22am

But hdf would have offered containing everything in a single file, similar to the MRB.

We do have to invent some tags for non-clinical projects (eg., genus, species, strains, genetic variants, collection/accession IDs, specimen numbers, etc…, things DICOM format never probably needed). Ideally, I would like to have them travel with the image data itself, not just sit in a remote server. Currently we seem to stuck in a place where all this information needs to be either part of the filename (when using a research format like NRRD), which makes them unwieldy (e.g., BKS.Cg-Dock7m +/+ Leprdb/J), not to mention error-prone. Or if we go with DICOM (and implement private fields as Andras mentioned), we have to deal with thousands of files per dataset, which makes distribution a bit more challenging.

Or Is it possible to store image volumes in dicom as a single file (like nifti or nrrd) and still maintain compatibility?

lassoan · December 23, 2018, 6:01am

Yes. Ability to store an entire image series in a single file was added to DICOM more than a decade ago (enhanced multi-frame IODs). Unfortunately, device manufacturers mostly stayed with the legacy single-frame-per-file format and many software (including Slicer) are less thoroughly tested with those rare multi-frame files.

pieper · December 23, 2018, 8:11pm

Actually DICOM has support for a lot of this kind of stuff, some driven by veterinary applications and others by preclinical research. Could you take look at what is already supported and make a list of what else you would need?

http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.html

We can always use non-standard extensions if needed, or we could suggest changes to the standard that would handle more use cases.

Right - that’s what the DICOM approach provides. Everything is explicit in the data file and a remote server or database is just a convenience for storage or efficient access.

MRB files are very handy. They are just zip files of nrrd and mrml with a different extension. We could always make a similar convention for DICOM if that would help us group our DICOM data.

Just as a note, .nhrd files can point to other data containers files so even if you kept the ‘canonical’ data in single or multiframe DICOM you could also be compatible with nrrd for interoperability.

muratmaga · December 23, 2018, 9:14pm

@pieper
I forgot about veterinary use cases! Yes, those do contain quite a bit of the stuff we would need or things that might be reworked. I will take a more careful look.

I like NHRD format. Problem is, I don’t know how to force to keep data in that format. For example, we make a NHDR that points out to a dicom for a metadata, but then when the user saves it is a nrrd from slicer, what happens? Perhaps you can argue that it is in the end users’ responsibility to be careful, but it is easy to make these kinds of mistakes.

Anyways, these discussions had been very useful for me and I do appreciate you guys taking the time for input.

fedorov · December 23, 2018, 10:12pm

I didn’t respond since I didn’t have enough time, and there is no easy answer. I think the short answer is that there is no easy solution, and it all depends what exactly you need. I think XNAT is the best “ready to go” platform, if you want to limit development or just explore the existing capabilities. I do not know though how well it is integrated with Slicer. Long time ago I saw some demos, but I never tried to use XNAT-Slicer integration for anything beyond a demo. Also, the last time I heard from the XNAT team about the Slicer plugin, in response to a report that it does not work they said (back in June): “The XNAT team doesn’t currently have anyone assigned for development on the XNATSlicer plugin. We intend to continue supporting the extension, but when we may have an update depends heavily on when we can find someone to actually do the update!” Not very encouraging.

5+ years ago I set up an XNAT instance integrated with CTP for de-identification, and we used it for something like 1000 DICOM studies here in the lab. But basic (in my view, anyway) things that I wanted to see supported - more flexible and customizable search, easy extension with processing plugins, better integration with image viewing and annotation tools - were either very difficult or not possible. I don’t know whether this changed fundamentally since then. I am actually thinking to take another look at XNAT during the project week.

I do think DICOM is the right foundation for organizing imaging data. But I also think that it will require a lot of effort to cover gaps in the standard itself and in its implementation. Over the past years we did a lot of work on those gaps for clinical imaging analysis results. Maybe this most recent paper and pointers therein can give you an idea of the overall approach: https://www.nature.com/articles/sdata2018281. But we did not do anything on the data management side, which is a huge gap (we are actively looking for opportunities and resources to work on that aspect as well). I feel if we as a community all made a commitment to use and improve DICOM, it would be very feasible to solve many problems using DICOM, but it is not the case, and the amount of work is overwhelming.

If you are interested in pre-clinical application, on top of all the issues related to dealing with the derived data, you will also be in a much worse situation with the vendors compliance on the acquisition side. Also I imagine the derived data is a lot more heterogeneous, the data analysis approaches are a lot more dynamic, and I don’t know if there are any existing implementations supporting the DICOM pre-clinical stuff to start from.

I would be very interested to learn from your experience using whatever approach you selected to support your work. If you are planning to be at the project week (this January or in the Summer), would be great to chat about this topic!

Douglas_Boyer · December 24, 2018, 2:03am

For what its worth, my team at MorphoSource (https://www.morphosource.org/) is developing a three.js volume viewer normalized to single file dicom volumes and nrrd for the “universal viewer” (https://universalviewer.io/), which is designed for interoperabilitty with iiif (https://iiif.io/) image servers. This could be a good standard to rally around since it will undubitably be the solution of choice for the library preservation community (assuming it works!) and more generic preservation repositories. Initially we will build some basic measurement tools, but more functionality could be added down the road.

muratmaga · December 24, 2018, 4:54am

@fedorov Unfortunately, I won’t be able to make it the upcoming winter PW, but hoping to attend the summer one, if it is in Boston as usual. DICOM, especially multi-frame one, seems like a good overall solution, but surprisingly few programs seem to support it well.

Currently I am leaning towards a solution that looks like Andras’ current workflow; ie., one that uses MRBs to store both the original image data and all derivative datasets associated with a single patient/specimen and a redcap (or likes) DB for access to the meta and PHI data for IRB approved folks. Same approach can be used for pre-clinical as well I think. I guess we will start one way and learn from out mistakes.

Topic		Replies	Views
Data structure and organization for efficient project workflow Support	1	326	February 1, 2018
Command line conversion of DICOM images and RTStruct to NRRD Support dicom	1	485	August 30, 2018
Machine Learning Project - Questions about integrating Slicer into workflow Support	8	1536	February 1, 2018
2021.02.16 Hangout Weekly meetings	6	306	February 16, 2021
Slicer to DICOM contrast Issues Support dicom	9	972	October 7, 2019

Request for suggestions for image informatics tools

Related topics