Hi everyone, I posted earlier about a machine learning project, but it was recommended to create a new thread about each topic rather than a single stream-of-concsiousness type post . So I’d like to start from the begining: data.
Problem:
As stated in the previous post, we have ~15,000 knee MRI studies in DICOM format. I’m working with a sample of 30 studies to play around with. Six months ago, when I was even more clueless than I am now, I took the time to manually go through these 30 studies and create a directory tree as follows:
I chose to do it this way 6 months ago because I thought it would be valuable to be able to know which planes or sequences we are dealing with however I am not sure if there is any value to this directory structure given that I can simply drag and drop the entire “Patients” folder into Slicer and it (DICOM module) will automatically organize (by patient, study, series) the data for me. I had orginally imagined some type of recursive code (base case == terminal folder i.e. folder with no folders in it) to loop through all of the folders starting at the top of the tree (Patients) and extracting the indvidual slices from each sequence and aggregate them into their respective volumes and feed them as inputs intro training a deep network for segmentation/classification purposes.
Questions:
Given Slicer’s DICOM module, is there any utility/purpose in organizing imaging data into an arbitrary directory sructure (such as the one I made)? Do I need structure at all or can I simply have all of my DICOM files in a single folder? Generalizing this question a bit, when organizing a machine learning project such as this using 3D Slicer, how important is data (DICOM specifically) organization? Again, although I am working with a sample of 30, these questions are in the context of 15,000 studies. The reason I ask if for automation/efficiency purposes i.e. scripting the import, conversion, registration, etc. in Slicer with Python.
I have more questions in this topic e.g. after importing DICOM files, visualizing issues, converesion to NRRD issues, etc., but I think this is a good start (literally where I started 6 months ago). Thanks!