Dear Users,
I am a newbie in the realm of coding. I’m a physician-researcher who wants to do a project of developing a deep learning radiomics signature for a specific spinal anatomical variant. My understanding for the steps I need to do is the following:
- Obtain DICOMs of 200 cervical spine CT (computed tomography) scans with 400 sides (150 have the anatomical variant; 250 do not have it) - done

- Perform segmentation in 3D Slicer for each side of each patient (400 sides) - done

- Extract radiomics features using 3D slicer module/extension ‘SlicerRadiomics’ (also called “Radiomics”). 107 features for each patient with features. The output is a table in .tsv file for each patient. Meaning there are 400 such tables. - done

What should I do now? Is it as follows:?
4. Combine all those 400 tables into a single excel .csv file so that each side of each patient has its own row and columns are the extracted features. ![]()
5. Import to ‘RStudio’ and perform Z score normalization of all the features so that they have a standard scale or range ![]()
6. Divide data into a training and a test cohort using an 80/20 split
7. Perform radiomics feature selection using a LASSO regression model or Boruta package ![]()
8. Do unsupervised hierarchical clustering of normalized radiomics features using the package ComplexHeatmap in R ![]()
9. Do binary logistic regression analysis in R using the presence of the anatomical variant as the dependent variable and radiomics features as independent variables. ![]()
10. Draw receiver operating characteristic (ROC) curves and calculate areas under the ROC curve (AUC) including 95% confidence intervals. ![]()
Are the steps correct? Are they in the correct order? Or should something be changed? I am not sure about the step #4 (combining the tables) and order of steps #5 to #8
Please, kindly use language that a layman without much coding experience will understand.
Thank you
kind regards,
Tomasz