For some of these question, you have to consult a text book on statistical design for biological experiments. You can find how it is done in context geometric morphometrics in the textbook by Zelditch et al (Geometric morphometrics for biologists), and also the help pages and vignettes of geomorph R package is very useful.
The statistical model then would evaluate whether the inter-observer variability is exceeding inter-subject variability (which is of course a problem). For that you need replicates by each landmarker. That why it get a bit tedious. Keep the model simple, and it would be something like:
ID + Rater:Replicate, in which
ID: is the name of your subject
Rater: if the name of your people doing the landmarking
Replicate: however many replicates of the landmarking done by raters (minimum two of course).
If the Rater:Replicate term is statistically significant, then it means your raters have are doing things a bit differently, and you have to decide what to do with that.
The reason it get tedious, if you have three specimens (ID=3), three raters, and minimum of two replicates, you have to do 3x3x2 landmark sets, and people usually do this like a dozen samples.
Also, because this is based on GPA aligned coordinates, it uses the procrustes distance (total amount of error) in the model. Hence, you cannot tell whether it is the 2-3 landmarks causing a problem; simply whether the raters are doing the same or not.
That’s why I personally preferred to look at this without the GPA framework. For any specimen, I can calculate a mean landmark set based on all observations, then evaluate how different each person was on a landmark basis. I simply use basic algebra to do this.
You cannot do any of this in GPA in SlicerMorph. We do not have support for statistical models. The factor/covariate framework is there to only change plot colors. As is, GPA module in SlicerMorph is mostly for visualizing the outcomes of landmarking and making sure that there are no digitization or data errors. We leave the statistical inference side to the user, most of whom use R. In future we do plan to have an integration with R, particularly geomorph to do exactly this type of analysis but that’s about a year out.