# [R-sig-ME] [EXTERNAL] Analyzing similarity scores between subjects

Tom Philippi tom_philippi @ending from np@@gov
Thu Aug 9 01:57:09 CEST 2018

```Han--
At the risk of sending you down a completely different rabbit hole:

Each subject contributes to N - 1 similarities at each time.  Depending on
the properties of your similarity score (triangle inequality & such), even
in your half_data example you don't have N * (N - 1) / 2 independent
observations.  One outlier individual will produce N-1 low similarities.
The standard approach to such dependent response variables is to retain the
matrix of similarities with subjects as rows & columns, but permute the
values of the predictor variables across the subjects for ANOSIM or Mantel
tests.

Your hypothesis seems to be a simple pairwise similarity within group N
(N-N) is greater than within group Y (Y-Y) or between groups (N-Y).  If you
had a single time or bout, that would be anosim (analysis of similarity),
where some metric (e.g., mean similarity of N-N) is calculated for the real
data, then the N & Y values are permuted across subjects, and the metric
from the real data is compared to the distribution of the metric across the
permutations (a form of Mantel Test).  One _could_ use a complex mixed
model for the metric, but for your stated hypothesis there is no reason
to.  [Also, since none of the pairwise similarities are changed for that
null distribution, just which ones are considered as N-N, the mean of N-N
similarities is a sufficient statistic.]  Several packages have functions
to perform this test, including ade4, ape, vegan.

Because you have 4 times for the same subjects, things in terms of
hypotheses get much more complicated.  Do individual subjects tend to
persist in the same state Y or N, or is state at time X not predictive of
state at time X + 1?  If subjects tend to persist in N or Y, then you might
set up hypotheses across all 4 time matrices, and permute the observed
sequences of 4 states (N-N-N-Y) across individual subjects. If a subject's
state at time X + 1 is independent of state at time X, for pairs of
subjects with N-N at 1-3 times, you could ask if their similarity at N-N
times is greater than at the other times.  Are these timepoints in a
treatment, where you have hypotheses about the effect getting stronger (or
weaker) over time?  Some such hypotheses are simple to test with functions
in vegan (or ape, etc.), while others may require explicit coding of
restricted permutations.

There are also general approaches to permutation tests for ANOVA (Anderson
2001) and outer partitioning of dissimilarity matrices (McArdle & Anderson
2001, implemented in vegan::adonis)

These approaches do not explicitly use fixed vs random effects.  Rather,
the within-subject correlated measures are accounted for via restrictions
on the permutations.  See, for example, Jari's reply here:

This may or may not be useful for your particular question.  I hope it's at
least worth your time to think about.

Tom

Anderson, M.J., 2001. A new method for non‐parametric multivariate analysis
of variance. *Austral ecology*, *26*(1), pp.32-46.

Legendre, P. and Anderson, M.J., 1999. Distance‐based redundancy analysis:
testing multispecies responses in multifactorial ecological
experiments. *Ecological
monographs*, *69*(1), pp.1-24.

McArdle, B.H. and Anderson, M.J., 2001. Fitting multivariate models to
community data: a comment on distance‐based redundancy analysis. *Ecology*,
*82*(1), pp.290-297.

On Wed, Aug 8, 2018 at 10:23 AM Han Zhang <hanzh using umich.edu> wrote:

> Hi all,
>
> I have a modeling problem involving similarity scores between subjects.
> During 4 time points in my experiment, I sampled eye movements of my
> subjects. At each time point, subjects had either one of two different
> states, Y or N. I have no control of the state, it is purely observational.
> My data produces 4 similarity matrices - for each sampling, every subject
> was compared to every other subject on some similarity measure of eye
> movements (self-comparisons excluded). Each matrix contains three types of
> comparison: N-N, N-Y, and Y-Y. My hypothesis is that the eye movements of
> those in state N were more similar to each other, compared to N-Y, or Y-Y.
> So N-N > N-Y or Y-Y.
>
> I came up with a model like this:
>
> lmer(dist ~ type + (1|sub_i) + (1|sub_i:type) + (1|segment) +
> (1|segment:type) + (1|sub_i: segment) + (1|sub_i: segment:type), data,
> REML=F)
>
> where dist is the similarity score, type is a 3-level factor (n-n, n-y,
> y-y), sub_i is subject ID, segment is sample ID. I was
> trying to build a model with a "maximal" random structure.
>
> Have I correctly specified my model? I have two concerns:
> (1) because any given data point in the matrix belongs to two subjects, i
> and j, should I include random effects for both subject i and subject j?
>
> (2) Becuase each matrix is symmetrical, I am duplicating my data in the
> above model. Should I use only the unique pairwise comparisons and do
> something like this:
>
> lmer(dist ~ type + (1|segment) + (1|segment:type), half_data, REML=F)
>
> Thanks!
>
>
> --
> Han Zhang
> Combined Program in Education and Psychology
> University of Michigan, Ann Arbor
> Email: hanzh using umich.edu
> Phone: 1-734-680-6031
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]

```