[R] metaMDS with large dataset produces 'insufficient data' warning

Jari Oksanen jari.oksanen at oulu.fi
Tue May 28 15:43:51 CEST 2013


Raeanne Miller <Raeanne.Miller <at> sams.ac.uk> writes:

> 
> Greetings everyone,
> 
> I am running MDS on a very large dataset (12 x 25071 - 12 model runs with
25071 output values each), and also on a
> very much reduced version of the dataset (randomly select 1000 of the
25071 output values). I would like to
> look at similarities/dissimilarities between the 12 model runs. When I use
metaMDS on the full dataset, I
> get a warning message:
> 
> Warning message:
> In metaMDS(MDSdata, distance = "bray", k = 2, autotransform = FALSE) :
>   Stress is (nearly) zero - you may have insufficient data
> 
> I don't think I have insufficient data, with 12 x 25071 data points, and
when I reduce the dataset to only 1000
> values per model run (so only 12 x 1000) I don't get this warning (though
the final stress is now only just
> below 0.2 - my desired value).
> 
> Is this warning because I have insufficient data? Or is it because of the
nature of a large dataset?
> 

Twelve points is not a large data set, but pretty small. Or that depends
on how to interpret your message. It is the number of points that defines
the data set size -- columns do not count. Further, it is a warning to alert
you on possible problems. Everything may be OK, but you should 
have a look at the results. 

If it really is so that reducing the number of variables from 25071 to
1000 changes the results so that stress increases from 0 to 0.2, then
you probably managed to remove some very influential variables from
your data. It may be that there are only some few dominant variables
that mostly define the dissimilarities and these give such a simple
data structure that you get the warning when they are included.

With default options, you get zero stress with six points, so that you 
should be on the safe side. Probably it is something funny in your 
data. 

Cheers, Jari Oksanen
on possible problems. It is up to you see if there are problems or not.



More information about the R-help mailing list