[R] procrustes results affected by order of sites in input file
Gavin Simpson
gavin.simpson at ucl.ac.uk
Fri Dec 3 12:31:29 CET 2010
On Fri, 2010-12-03 at 09:58 +0000, Gavin Simpson wrote:
> On Thu, 2010-12-02 at 11:19 -0600, Christine Dolph wrote:
> > Hi, Thanks very much for your response.
>
> Thanks Christy,
>
> Apologies if I sounded off-hand or dismissive yesterday. It was a busy
> day, and as your mail lacked a reproducible example nor the code you
> ran, I wanted to deal with the low-hanging fruit before spending time
> looking at the issue.
>
> I am currently running a simulation on one of the in-built datasets to
> see if I can replicate your findings.
...and I cannot. When the problem is isolated to protest(), the ordering
of samples in the original data causes no systematic issues in the
permutation test of the procrustes correlation. This is where I know
there are no ties in the dissimilarities.
It would appear that we can induce an issue in isoMDS() that makes it
order dependent if you have tied dissimilarities in your data.
Can you send the output of
length(dij)
and
length(unique(dij))
where
dij <- vegdist(fish.data)
and
dij <- vegdist(bugcal.a)
in turns please?
Whether the problem you report is related to the issue of tie-breaking
in isoMDS() is not yet clear, but Jari has contacted the maintainers of
the MASS package to investigate the ties issue further.
At this point, I think we should take this discussion off list, and Jari
or I will be in contact with you directly Christy, once we do a bit more
investigation. As such, would you mind either:
1) Starting a new thread on R-Forge, including your original email and
the output I ask for above, here:
https://r-forge.r-project.org/forum/forum.php?forum_id=193&group_id=68
or
2) Posting a bug against vegan, including your original email, plus the
output of the commands I ask for above, here:
https://r-forge.r-project.org/tracker/?atid=330&group_id=68&func=browse
Either way, that will allow us to track the issue to resolution without
adding to the in-boxes of R-help subscribers.
Thanks again,
Gavin.
> Jari has mentioned, privately, that isoMDS() can be order dependent if
> there are ties in the dissimilarities. What, if anything, this has to do
> with the protest issue you report is not clear, hence the simulation I
> am running.
>
> I will report back later when the simulation has finished... which could
> be some time if I continued to code bone-headedly this morning!
>
> Finally, thanks for raising this issue with us.
>
> All the best,
>
> G
>
> > Unfortunately, using the set.seed() call does not seem to solve my problem.
> > If I do not use set.seed(), I do indeed get some small differences in
> > protest() results due to the effect of random starts. But with my sites in a
> > given order in the input files, and using set.seed(), I get the same results
> > every time I run protest().
> >
> > It's only when I change the order of sites in the input files that I get a
> > big change in results. The differences I see in protest() results after
> > re-sorting the order of sites in the input file is quite large, larger than
> > the smaller differences do to random starts I see if I run protest() without
> > the set.seed. When sites are re-ordered in the input file, the
> > correlation-like statistic produced by protest() can jump from between 0.28
> > to 0.52 for a dataset with 38 sites, with a corresponding change in
> > significance. Without using set.seed the numbers do jump around a bit, but
> > not to the same extreme degree.
> >
> > Looking at the NMDS plots to see how ordinations change with the order of
> > sites in the file, it seems that if I re-order sites, I am simply getting
> > ordination solutions that are reflections on one another across one of the 3
> > NMDS axes. These are not real differences in the ordination solutions. Is
> > there something I am failing to include in the protest() call that should
> > enable the function to flip one ordination around to maximize it's
> > similarity to the other ordination (i.e., that should orient the two
> > ordinations the same way in 3-dimensional space)?
> >
> > Thanks,
> >
> > --Christy
> >
> >
> > On Thu, Dec 2, 2010 at 6:02 AM, Gavin Simpson <gavin.simpson at ucl.ac.uk>wrote:
> >
> > > On Wed, 2010-12-01 at 14:19 -0600, Christine Dolph wrote:
> > > > Dear All,
> > > >
> > > > I am using a Procrustes analysis to compare two NMDS ordinations for the
> > > > same set of sites. One ordination is based on fish data, the other is
> > > based
> > > > on invertebrate data. Ordinations were derived using metaMDS() from the
> > > > {vegan} library as follows:
> > > >
> > > > fish.mds<-metaMDS(fish.data, distance="bray", k=3, trymax=100,
> > > > wascores=TRUE, trace=TRUE, zero="add")
> > > >
> > > > invert.mds<-metaMDS(bugcal.a, distance="bray", k=3, trymax=100,
> > > > wascores=TRUE, trace=TRUE, zero="add"),
> > > >
> > > >
> > > > where fish.data and invert.data are site-by-abundance matrices for fish
> > > and
> > > > invertebrates, respectively.
> > > >
> > > > I have then used protest() to test the significance between the two
> > > > ordinations:
> > >
> > > Simple things first:
> > >
> > > You did have a set.seed() call before the protest() call didn't you? In
> > > fact, I'd probably put set.seed() calls before the two metaMDS() calls
> > > as well.
> > >
> > > The ordering should have no impact at all, but the random starts in
> > > metaMDS() and the random permutations in protest() will vary from run to
> > > run unless you set the random seed to some known value.
> > >
> > > G
> > >
> > > > protest.results<-protest(fish.mds, invert.mds, scale=TRUE,
> > > symmetric=TRUE,
> > > > permutations=999)
> > > >
> > > > The problem is, the results of the protest analysis seem to be affected
> > > by
> > > > the original ordering of sites in the data input files. That is, if I
> > > > re-sort one of the input files based on some criteria, I sometimes see a
> > > > change in the strength and significance of the concordance results. In an
> > > > attempt to figure out what is happening, I have re-ordered each dataset
> > > in a
> > > > number of ways, and plotted the resulting ordinations. I have seen that
> > > > while the ordering of sites in the input file does not change the spatial
> > > > relationship between them (i.e., does not change the fundamental
> > > ordination
> > > > solution), it can change how the sites are oriented with respect to the
> > > > different NMDS axes. I was of the belief that a difference in orientation
> > > > with respect to the NMDS axes should not affect the Procrustes comparison
> > > of
> > > > two ordinations, as the protest function should be rotating and
> > > reflecting
> > > > one matrix with respect to the other to find the closest match between
> > > them
> > > > (i.e., it should be accounting for differences in how the two solutions
> > > are
> > > > oriented in space). However, I appear to see different results depending
> > > on
> > > > how sites are oriented with respect to each NMDS axis.
> > > >
> > > > When comparing two ordination solutions with Protest, is it necessary to
> > > > ensure that the original data input files are ordered in the same way?
> > > >
> > > > Thanks in advance for any insight.
> > > >
> > > > Sincerely,
> > > >
> > > >
> > > >
> > >
> > > --
> > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> > > ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> > > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> > > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/<http://www.ucl.ac.uk/%7Eucfagls/>
> > > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > >
> > >
> >
> >
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list