Using multiple markers with Plots

Hello,

I am trying to manipulate a scatter plot. Data is stored in a table with bunch of classifiers like experimental groups, sex, etc, which I would like to show as different colors/markers. Unfortunately, I couldn’t figure out how to do it with the UI, is there a way to do it through python?

image

You can set marker style and color for each series.

Yes, I can do add another series and plot them together. But there doesn’t appear to be an option to subset the data.

image

Why would you change the marker style within one series? It should not be too hard to split your data set into multiple series.

You can select a subset of points, if you want to extract certain data points.

What is your use case?

My use case is a dataframe with bunch of PC scores from morphometric analysis of specimens and their associated covariates or classifiers such as sex, genotype, age, etc… I want to visually inspect, if any of the segregation along PC plots corresponds to one of these covariates (e.g., male vs female). Both for exploration and data vetting.

I know I can interactively select points on the plot, what I don’t know what to do with that selection and how to use that to subset from UI.

If it helps, here is a sample dataset:
https://app.box.com/s/5n6xdtld331d2r1c8yqhskzhuzkj0568

If it explains any further, this is what I would have done in R with the same data (sorry box doesn’t let me read the data directly to R, you need to save the csv file locally, and read it).

dat=read.csv(file=‘pcScores.csv’)
plot(dat$PC1, dat$PC2, col=as.integer(dat$Group), pch=20, cex=2)

Clearly based on morphometric results, one individual does not belong to the group it has been labeled with (either the label is wrong, or the experiment had no effect).

There tools specifically developed for general data mining/exploration (Knime, RapidMiner, etc.) and you can explore data by little generic Python or R coding. I would only use Slicer plots for implementing specific workflows already identified using these tools. For example, you could implement a Slicer module, which splits series in a table based on a selected column’s value and plots them.

If you are lucky then the first few principal components may coincide with some real-world (biologically, anatomically, clinically, …) meaningful values, but very often they do not (instead, each principal component may represent a mix of several real-world factors). Since you have labelled data, you can directly obtain optimal separation of groups by using a supervised classification/regression method (SVM, LDA, neural networks, etc.).

Those are good points, but a little beyond what I am trying to accomplish.

That’s a tutorial dataset, and I am just trying to showcase a simple use of PCA to catch a case of (1) either data mislabeling or (2) an interesting case where an 1 out of 20 experiments failed for some reason. I would like to do that in Slicer, through UI if possible.

While your comments about PCs are quite true for heteregenous populations, where you cannot control for confounding factors easily, things are a little different in the experimental realm, where we can have quite a bit of control at the very low level (as much as biology permits us). For example, those mice are genetically identical with the exception of a single mutation in one group, which causes haploinsufficiency (gene dosage effect). Granted, it is as good as it gets (with the exception of one data point), and hence it is a tutorial data.

This is not my dataset, how the investigator chases these results is up to them, I am just trying to make sure simple data vetting can be done within Slicer without having to export them into another platform. If I were them, I would have double checked the data labelling for that specimen, and if I am certain of all my experimental notes, then I would do a whole genome sequencing on that specimen.

I think at this point the only way to do this in Slicer with a GUI is to create a small module that sets up the plot (takes a table and 2 independent variable columns, 1 dependent variable column; splits the series to two multiple series based on dependent variable values, and shows them in a plot). It could be a good example of how easy it is to create such a custom module.