Dear Users,

I am a newbie in the realm of coding. I’m a physician-researcher who wants to do a project of developing a deep learning radiomics signature for a specific spinal anatomical variant. My understanding for the steps I need to do is the following:

- Obtain DICOMs of 200 cervical spine CT (computed tomography) scans with 400 sides (150 have the anatomical variant; 250 do not have it) - done
- Perform segmentation in 3D Slicer for each side of each patient (400 sides) - done
- Extract radiomics features using 3D slicer module/extension ‘SlicerRadiomics’ (also called “Radiomics”). 107 features for each patient with features. The output is a table in .tsv file for each patient. Meaning there are 400 such tables. - done

What should I do now? Is it as follows:?

4. Combine all those 400 tables into a single excel .csv file so that each side of each patient has its own row and columns are the extracted features.

5. Import to ‘RStudio’ and perform Z score normalization of all the features so that they have a standard scale or range

6. Divide data into a training and a test cohort using an 80/20 split

7. Perform radiomics feature selection using a LASSO regression model or Boruta package

8. Do unsupervised hierarchical clustering of normalized radiomics features using the package ComplexHeatmap in R

9. Do binary logistic regression analysis in R using the presence of the anatomical variant as the dependent variable and radiomics features as independent variables.

10. Draw receiver operating characteristic (ROC) curves and calculate areas under the ROC curve (AUC) including 95% confidence intervals.

Are the steps correct? Are they in the correct order? Or should something be changed? I am not sure about the step #4 (combining the tables) and order of steps #5 to #8

Please, kindly use language that a layman without much coding experience will understand.

Thank you

kind regards,

Tomasz