[R-sig-eco] Mantel
Gavin Simpson
gavin.simpson at ucl.ac.uk
Fri Apr 20 10:49:23 CEST 2012
To get a good idea of potential memory usage you should profile the code
to confirm whether copying of arguments occurs during the operation of
mantel.partial().
You need an R compiled for memory tracing. Using the script below I see:
R> tracemem(veg.dist)
[1] "<0x24d9090>"
R> tracemem(env.dist)
[1] "<0x3758070>"
R> tracemem(rand.dist)
[1] "<0x226daa0>"
R>
R> mantel.partial(veg.dist, env.dist, rand.dist)
tracemem[0x3758070 -> 0x3898820]: as.vector mantel.partial
tracemem[0x226daa0 -> 0x38990f0]: as.vector mantel.partial
tracemem[0x24d9090 -> 0x38999c0]: as.vector cor.test mantel.partial
tracemem[0x24d9090 -> 0x38c7900]: as.vector is.data.frame cor mantel.partial
So four copies of one of the Dijs are made at some point. In the first
of the tracemem statements printed *after* the call to
`mantel.partial()`, we note that `env.dist` is copied (<0x3758070>).
Next up `rand.dist` (<0x226daa0> the matrix to be partialed out) is
copied. The last two lines indicate two copies of `veg.dist` are made
(<0x24d9090>).
So in addition to memory requirements for holding each as a matrix, we
need memory to hold the original dist objects plus one extra copy of the
full matrix of `veg.dist` as I don't think garbage collection will save
you from having to hold two copies of the full `veg.dist` matrix in
memory at once, but I may be wrong on that count.
Anyway, to know for sure, compile your R with memory profiling, edit
`mantel.partial()` to tracemem the `xdis`, `ydis` and `zdis` objects
created in the first three lines of `mantel.partial()` and then run the
script below to look at any copying that is going on. Total them up and
that should give you some idea of how many copies are needed to run the
analysis and then work back from there to work out how much memory you
need.
A rough rule of thumb is to have 2-4 times as much memory available to R
as the size of the objects in the workspace to allow for copying.
In addition, you need to consider the OS on which you run things - R can
only address a chunk of memory as big as the OS allows. IIRC on Windows
(32bit) this is somewhat less than the 4Gb of RAM that one might have in
such a system.
HTH
G
require(vegan)
data(varespec)
data(varechem)
veg.dist <- vegdist(varespec)
env.dist <- vegdist(scale(varechem), "euclid")
set.seed(1)
rand.dist <- vegdist(matrix(rnorm(24 * 10), ncol = 10),
"euclid")
tracemem(veg.dist)
tracemem(env.dist)
tracemem(rand.dist)
mantel.partial(veg.dist, env.dist, rand.dist)
On Thu, 2012-04-19 at 20:15 -0600, Peter Solymos wrote:
> Jonathan and Chris,
>
> The mantel function in vegan package contains dist-to-matrix coercion,
> so memory requirements for matrices should be setting the limit.
>
> Cheers,
>
> Peter
>
> Péter Sólymos
> Alberta Biodiversity Monitoring Institute
> and Boreal Avian Modelling project
> Department of Biological Sciences
> CW 405, Biological Sciences Bldg
> University of Alberta
> Edmonton, Alberta, T6G 2E9, Canada
> Phone: 780.492.8534
> Fax: 780.492.7635
> email <- paste("solymos", "ualberta.ca", sep = "@")
> http://www.abmi.ca
> http://www.borealbirds.ca
> http://sites.google.com/site/psolymos
>
>
>
> On Thu, Apr 19, 2012 at 7:12 PM, Chris Howden
> <chris at trickysolutions.com.au> wrote:
> > I can't comment on vegan but R in general can handle a matrix with about
> > 2*10^9 elements (for more R memory info look at
> > http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html)
> >
> > I believe distance matrices usually only store either the lower or upper
> > diagonal. So the number of elements in a distance matrix are approx 1/2
> > the number of elements in a matrix.
> >
> > I use the following code to see if my data is too big for R.
> >
> > ## # CHECK: IS DISTANCE MATRIX TOO BIG FOR MEMORY?
> > ## # What is the min memory required for the distance matrix, assuming 1
> > bit for each distance (which is actually too small)
> > ## # CHECK RESULT GB: requires 8GB which is too big for my computer
> > ## dim(segment.input)
> > ## (nrow(segment.input)^2)/2
> > ## (nrow(segment.input)^2)/(2*1000000000)
> >
> > ## # CHECK RESULT vector length, keeping in mind that I think distance
> > matrices only require half the matrix to
> > ## # store all the data, max is 2*10^9: Its bigger
> > ## (nrow(segment.input)^2)/2 - 2*10^9
> >
> >
> > ## ## interstingly nrow(segment.input)*nrow(segment.input) won't work for
> > large numbers we get
> > ## ## > nrows*nrows
> > ## ## [1] NA
> > ## ## Warning message:
> > ## ## In nrows * nrows : NAs produced by integer overflow
> >
> >
> > Chris Howden B.Sc. (Hons) GStat.
> > Founding Partner
> > Evidence Based Strategic Development, IP Commercialisation and Innovation,
> > Data Analysis, Modelling and Training
> > (mobile) 0410 689 945
> > (fax) +612 4782 9023
> > chris at trickysolutions.com.au
> >
> >
> >
> >
> > Disclaimer: The information in this email and any attachments to it are
> > confidential and may contain legally privileged information. If you are
> > not the named or intended recipient, please delete this communication and
> > contact us immediately. Please note you are not authorised to copy, use or
> > disclose this communication or any attachments without our consent.
> > Although this email has been checked by anti-virus software, there is a
> > risk that email messages may be corrupted or infected by viruses or other
> > interferences. No responsibility is accepted for such interference. Unless
> > expressly stated, the views of the writer are not those of the company.
> > Tricky Solutions always does our best to provide accurate forecasts and
> > analyses based on the data supplied, however it is possible that some
> > important predictors were not included in the data sent to us. Information
> > provided by us should not be solely relied upon when making decisions and
> > clients should use their own judgement.
> >
> >
> > -----Original Message-----
> > From: r-sig-ecology-bounces at r-project.org
> > [mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of Jonathan Hughes
> > Sent: Friday, 20 April 2012 10:24 AM
> > To: r-sig-ecology at r-project.org
> > Subject: [R-sig-eco] Mantel
> >
> >
> >
> > Dear all,
> > Does anyone have an expectation of the maximum distance matrix size that
> > vegan can handle during a partial Mantel test?
> > thanks,
> > Jonathan
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-sig-ecology
mailing list