SQLite3 query in Python returns less data than what appears in the Slicer DICOM module

Hello everyone,

I am currently working on constructing a database of DICOM tags using 3D Slicer. I have imported DICOM image data of approximately 700 patients into the Slicer, which seems to have successfully processed all the files as the DICOM tags within the Slicer UI show data for all the patients.

To extract this data, I used the “ctkDICOM.sql” file that Slicer generated. This file is around 2GB in size, with an additional 20GB cache SQL file. When I attempt to parse the “ctkDICOM.sql” file using the sqlite3 module in Python, however, I find that data for around 200 patients seems to be missing.

Despite this, there doesn’t appear to be an issue with the original DICOM data, as all patient data is correctly displayed in the Slicer UI. I have double-checked the DICOM files, and they don’t seem to be the problem.

I was wondering if anyone could provide some guidance on this. Specifically, my goal is to generate a single SQL file using Slicer that contains all the DICOM tag information for these patients.

Any help or insights would be greatly appreciated!

Thank you in advance.

The dicom database is managed by the CTK code, including the mapping from what’s in the database to what’s displayed on the screen, as generally described here:

That should explain how the fields are generated. But there’s no reason 200 patients should be missing from the database if they are displayed on the GUI (they should be the same). Maybe try recreating the issue, perhaps with public data, and if you can let us know the steps and hopefully someone can help you troubleshoot.

SQlite3 sets default limits for query results. It seems that with your select query you reached the default limit of 500 records. You can use the LIMIT clause in your query to get all the records.

Thanks Andras and Pieper for your thoughts. I’ve tried a few things already:

  1. I used a “LIMIT” command to see if there’s an issue with sqlite3, but that didn’t help.
  2. I also opened the ctkDICOM.sql file using a tool called DB browser, but I still saw the same problem.

I noticed that 200 patients are missing, but they’re not just at the end of the list. They’re missing randomly when I try to look at the data using the DB browser or Python. But, I can see them when I use the Slicer UI.
Also, some data shows as ‘None’ when I look at it in the DB browser or Python, but it’s there when I use the Slicer UI.

So, my guess is that Slicer might be using cache file called ctkDICOMTagCache.sql to fill in the missing data and patients. That might be how Slicer manages to show all the data.

Now, I’m going to try moving the data to a new computer. I want to see if I still have the same problems with the ctkDICOM.sql generated file by Slicer and the missing data.

You can ignore the tag cache file, that just stores some fields for faster access. Only ctkDICOM.sql content matters. You can find the list of patients in Patients table.

Note that none of these files are part of the public Slicer API. If you can find a way to extract useful information from the sqlite files then that is fine for us, but we do not support this (because that would impose many limitations on how we can evolve the internal design in the future). The public API for the Slicer DICOM database is the ctkDICOMDatabase object that is accessible in Slicer Python environment as slicer.dicomDatabase.