[R-sig-eco] null models with continuous abundance data

Thu Jan 7 23:25:29 CET 2010

Thanks Peter,

summary.permat are plot.permat are very useful, thanks for that.

If I understand correctly from your help page and code, the problem you
describe is primarily with sequential algorithms, where the "null"
matrix obtained from a given run can actually be quite similar to the
one obtained in the previous run, and so a trend should be apparent in
the plot produced by plot.permat.

The simple algorithm I wrote starts with a new commsimulator binary
matrix at each run and then fills it randomly until marginal sums are
(more or less, need to implement backtracking) attained. Therefore, I
don't expect a problem with non-independency of successive runs, but
this of course could be easily checked using the code in summary.permat
and plot.permat.

Is this what you meant by null matrices not being random?

Thanks so much again,

Etienne

Le jeudi 07 janvier 2010 à 14:01 -0700, Peter Solymos a écrit :
> Etienne,
> 
> We used Chi-square statistic to inspect randomness, see the code in
> summary.permat.
> To test the convergence of sequential algorithms, you can use time
> series and MCMC diagnostic tools.
> 
> Peter
> 
> 
> 
> 
> 2010/1/7 Etienne Laliberté <etiennelaliberte at gmail.com>:
> > Many thanks Jari for your input.
> >
> > I'll have a look at this backtracking method and see how I could
> > implement it.
> >
> > Ensuring that the null matrices are indeed random is clearly good
> > advice, and I'd need to do this, but do you have any suggestions on how
> > to do this in practice? A copy of the script you've used to test Peter's
> > null model algorithms would be very useful.
> >
> > Thanks
> >
> > Etienne
> >
> >
> > Le jeudi 07 janvier 2010 à 22:24 +0200, Jari Oksanen a écrit :
> >> On 07/01/2010 21:48, "Etienne Laliberté" <etiennelaliberte at gmail.com> wrote:
> >>
> >> > Many thanks again Carsten.
> >>
> >> Yes, you're right that care must be taken to
> >> > ensure that a decent number
> >> of unique random matrices must be obtained. I
> >> > don't think it would be a
> >> problem in my case given that transforming my
> >> > continuous abundance data
> >> to count by
> >>
> >> mat2<- floor(mat * 100 / min(mat[mat >
> >> > 0]) )
> >>
> >> can quickly lead to a very large number of individuals (hundreds
> >> > of
> >> thousands if not > million in some cases). The issue then becomes
> >> > mostly
> >> one of computing speed, but that's another story.
> >>
> >> Following your
> >> > suggestion, I came up with a simple (read naïve) R
> >> algorithm that starts with
> >> > a binary matrix obtained by commsimulator
> >> then fills it randomly. It seems to
> >> > work relatively well, though it's
> >> extremely slow.
> >>
> >> In addition, because the
> >> > algorithm fills the matrix one individual at a
> >> time, I sometimes ran into the
> >> > problem of one individual which had
> >> nowhere to go because "ressources" at the
> >> > site (i.e. the row sum) was
> >> already full. In that case, this individual
> >> > disappears, i.e. goes
> >> nowhere. This is a bit drastic and clearly sub-optimal,
> >> > but it's the
> >> only way I could quickly think of yesterday to prevent the
> >> > algorithm
> >> from getting stuck. The consequence is that often, the total number
> >> > of
> >> individuals in the null matrix is a bit less than total number
> >> > of
> >> individuals in the observed matrix.
> >>
> >> Etienne,
> >>
> >> This is the same problem as filling up a binary matrix: you get stuck before
> >> you can fill in all presences. The solution in vegan::commsimulator was
> >> "backtracking" method: when you get stuck, you remove some of the items
> >> (backtrack to an earlier stage of filling) and start again. This is even
> >> slower than initial filling, but it may be doable. The model application of
> >> backtracking goes back to Gotelli & Entsminger (Oecologia 129, 282--291;
> >> 2001) who had this for binary data, but the same idea is natural for
> >> quantitative null models, too. The vegan vignette discusses some details of
> >> implementation which are valid also for quantitative filling.
> >>
> >> I would like to repeat the serious warning that Carsten Dormann made: you
> >> better be sure that your generated null models really are random. Péter
> >> Sólymos developed some quantitative null models for vegan, but we found in
> >> our tests that they were not random at all, but the constraints we put
> >> produced peculiar kind of data with regular features although there was
> >> technically no problem. Therefore we removed code worth of huge amount of
> >> developing work as dangerous for use. I do think that also the r2dtable
> >> approach is more regular and less variable (lower variance) than the
> >> community data, and you very easily get misleadingly significant results
> >> when you compare your observed community against too regular null models.
> >> This is something analogous to using strict Poisson error in glm or gam when
> >> the data in fact are overdispersed to Poisson. Be very careful if you want
> >> to claim significant results based on quantitative null models. Would I be a
> >> referee or an editor for this kind of ms, I would be very skeptical ask for
> >> the proof of the validity of the null model.
> >>
> >> Cheers, Jari Oksanen
> >>
> >> > I don't see this as being a
> >> > real
> >> problem with the large matrices I'll deal with though, but
> >> > definitely
> >> something to be aware of.
> >>
> >> If you're curious, here's the simple
> >> > algorithm, as well as some code to
> >> run a test below. I'd be very interested to
> >> > compare it to Vazquez's
> >> algorithm!
> >>
> >> Thanks again
> >>
> >> Etienne
> >>
> >> ###
> >>
> >> nullabun <-
> >> > function(x){
> >>    require(vegan)
> >>    nsites <- nrow(x)
> >>    nsp <- ncol(x)
> >>
> >> > rowsum <- rowSums(x)
> >>    colsum <- colSums(x)
> >>    nullbin <- commsimulator(x,
> >> > method = "quasiswap")
> >>    rownull <- rowSums(nullbin)
> >>    colnull <-
> >> > colSums(nullbin)
> >>    rowsum <- rowsum - rownull
> >>    colsum <- colsum - colnull
> >>
> >> > ind <- rep(1:nsp, colsum)
> >>    ress <- rep(1:nsites, rowsum)
> >>    total <-
> >> > sum(rowsum)
> >>    selinds <- sample(1:total)
> >>       for (j in 1:total){
> >>
> >> > indsel <- ind[selinds[j]]
> >>          sitepot <- which(nullbin[, indsel] > 0)
> >>
> >> > if (length(ress[ress %in% sitepot]) > 0 ){
> >>             sitesel <-
> >> > sample(ress[ress %in% sitepot], 1)
> >>             ress <- ress[-which(ress ==
> >> > sitesel)[1] ]
> >>             nullbin[sitesel, indsel] <- nullbin[sitesel, indsel]
> >> > + 1
> >>            }
> >>       }
> >> return(nullbin)
> >> }
> >>
> >>
> >> ### now let's test the
> >> > function
> >>
> >> # create a dummy abundance matrix
> >> m <-
> >> > matrix(c(1,3,2,0,3,1,0,2,1,0,2,1,0,0,1,2,0,3,0,0,0,1,4,3), 4,
> >> > 6,
> >> byrow=TRUE)
> >>
> >> # generate a null matrix
> >> m.null <- nullabun(m)
> >>
> >> # compare
> >> > total number of individuals
> >> sum(m) #30
> >> sum(m.null) #29: one individual got
> >> > "stuck" with no ressources...
> >>
> >> # to generate more than one null matrix, say
> >> > 999
> >> nulls <- replicate(n = 999, nullabun(m), simplify = F)
> >>
> >> # how many unique
> >> > null matrices?
> >> length(unique(nulls) ) # I found 983 out of 999
> >>
> >> # how many
> >> > individuals in m?
> >> sum(m) # there are 30
> >>
> >> # how bad is the problem of
> >> > individuals getting stuck with no ressources
> >> in null matrices?
> >> sums <-
> >> > as.numeric(lapply(nulls, sum, simplify = T))
> >> hist(sums) # the vast majority
> >> > had 29 or 30 individuals
> >>
> >>
> >> Le jeudi 07 janvier 2010 à 10:34 +0100, Carsten
> >> > Dormann a écrit :
> >> > Hi Etienne,
> >> >
> >> > I'm afraid that swap.web cannot easily
> >> > accommodate this constraint.
> >> > Diego Vazquez has used an alternative approach
> >> > for this problem, but I
> >> > haven't seen code for it (it's briefly described in
> >> > his Oikos 2005
> >> > paper). While swap.web starts with a "too-full" matrix and
> >> > then
> >> > downsamples, Diego starts by allocating bottom-up. There it should be
> >> >
> >> > relatively easy to also constrain the row-constancy of connectance.
> >> >
> >> > I
> >> > can't promise, but I will hopefully have this algorithm by the end
> >> > of next
> >> > week (I'm teaching this stuff next week and this is one of the
> >> > assignments).
> >> > If so, I'll get it over to you.
> >> >
> >> > The problem that already arises with
> >> > swap.web for very sparse matrices
> >> > is that there are very few unique
> >> > combinations left after all these
> >> > constraints. This will be even worse for
> >> > your approach, and plant
> >> > community data are usually VERY sparse. Once you
> >> > have produced the
> >> > null models, you should assure yourself that there are
> >> > hundreds
> >> > (better thousands) of possible randomised matrices. Adding just
> >> > one
> >> > more constraint to your null model (that of column AND row constancy
> >> >
> >> > of 0s) will uniquely define the matrix! Referees may not pick it up,
> >> > but it
> >> > may give you trivial results.
> >> >
> >> > Best wishes,
> >> >
> >> > Carsten
> >> >
> >> >
> >> > Vázquez,
> >> > D. P., Melián, C. J., Williams, N. M., Blüthgen, N., Krasnov,
> >> > B. R., Poulin,
> >> > R., et al. (2007). Species abundance and asymmetric
> >> > interaction strength in
> >> > ecological networks. Oikos, 116, 1120-1127.
> >> >
> >> > On 06/01/2010 23:57, Etienne
> >> > Laliberté wrote:
> >> > > Many thanks Carsten and Peter for your suggestions.
> >> > >
> >> >
> >> > > commsimulator indeed respects the two contraints I'm interested in, but>
> >> > > only allows for binary data.
> >> > >
> >> > > swap.web is *almost* what I need, but
> >> > only overall matrix fill is kept
> >> > > constant, whereas I want zeros to move
> >> > only between rows, not between
> >> > > both columns and rows. In others words, if
> >> > the initial data matrix had
> >> > > three zeros for row 1, permuted matrices
> >> > should also have three zeros
> >> > > for that row.
> >> > >
> >> > > I do not doubt that
> >> > Peter's suggestions are good, but I'm afraid they
> >> > > seem a bit overly
> >> > complicated for my particular problem. All I'm after
> >> > > is to create n
> >> > randomly-assembled matrices from an observed species
> >> > > abundance matrix to
> >> > compare the observed functional diversity of the
> >> > > sampling sites to a null
> >> > expectation. To be conservative, this requires
> >> > > that I hold species
> >> > richness constant at each site, and keep row and
> >> > > column marginals fixed.
> >> >
> >> > >
> >> > > Could swap.web or permatswap(..., method = "quasiswap") be easily
> >> > >
> >> > tweaked to accomodate this? The only difference really is that matrix
> >> > > fill
> >> > should be kept constant *but also* be constrained within rows.
> >> > >
> >> > > Thanks
> >> > again for your help.
> >> > >
> >> > > Etienne
> >> > >
> >> > > Le 7 janvier 2010 08:17, Peter
> >> > Solymos <solymos at ualberta.ca> a écrit :
> >> > >
> >> > > > Dear Etienne,
> >> > > >
> >> > >
> >> > > You can try the Chris Hennig's prablus package which have a parametric
> >> > > >
> >> > bootstrap based null-model where clumpedness of occurrences or
> >> > > >
> >> > abundances (this might allow continuous data, too) is estimated from
> >> > > > the
> >> > site-species matrix and used in the null-model generation. But
> >> > > > here, the
> >> > sum of the matrix will vary randomly.
> >> > > >
> >> > > > But if you have
> >> > environmental covariates, you might try something more
> >> > > > parametric. For
> >> > example the simulate.rda or simulate.cca functions in
> >> > > > the vegan package,
> >> > or fit multivariate LM for nested models (i.e.
> >> > > > intercept only, and with
> >> > other covariates) and compare AIC's, or use
> >> > > > the simulate.lm to get
> >> > random numbers based on the fitted model. This
> >> > > > way you can base you
> >> > desired statistic on the simulated data sets, and
> >> > > > you know explicitly
> >> > what is the model (plus it is good for continuous
> >> > > > data that you have).
> >> > By using the null-model approach, you implicitly
> >> > > > have a model by
> >> > defining constraints for the permutations, and
> >> > > > p-values are
> >> > probabilities of the data given the constraints (null
> >> > > > hypothesis), and
> >> > not probability of the null hypothesis given the data
> >> > > > (what people
> >> > usually really want).
> >> > > >
> >> > > > Cheers,
> >> > > >
> >> > > > Peter
> >> > > >
> >> > > >
> >> > Péter Sólymos
> >> > > > Alberta Biodiversity Monitoring Institute
> >> > > > Department
> >> > of Biological Sciences
> >> > > > CW 405, Biological Sciences Bldg
> >> > > > University
> >> > of Alberta
> >> > > > Edmonton, Alberta, T6G 2E9, Canada
> >> > > > Phone:
> >> > 780.492.8534
> >> > > > Fax: 780.492.7635
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Jan 6,
> >> > 2010 at 2:18 AM, Carsten Dormann <carsten.dormann at ufz.de> wrote:
> >> > > >
> >> >
> >> > > > > Hi Etienne,
> >> > > > >
> >> > > > > the double constraint is observed by two
> >> > functions:
> >> > > > >
> >> > > > > swap.web in package bipartite
> >> > > > >
> >> > > > >
> >> > and
> >> > > > >
> >> > > > > commsimulator in vegan (at least in the r-forge
> >> > version)
> >> > > > >
> >> > > > > Both build on the r2dtable approach, i.e. you have,
> >> > as you propose, to turn
> >> > > > > the low values into higher-value integers.
> >> > >
> >> > > >
> >> > > > > The algorithm is described in the help to swap.web.
> >> > > > >
> >> > >
> >> > > >
> >> > > > > HTH,
> >> > > > >
> >> > > > > Carsten
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > On 06/01/2010 08:55, Etienne Laliberté wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > >
> >> > > > >
> >> > > > > > Let's say I have measured plant biomass for a total of 5
> >> > species from 3
> >> > > > > > sites (i.e. plots), such that I end with the
> >> > following data matrix
> >> > > > > >
> >> > > > > > mat<- matrix(c(0.35, 0.12, 0.61, 0,
> >> > 0, 0.28, 0, 0.42, 0.31, 0.19, 0.82,
> >> > > > > > 0, 0, 0, 0.25), 3, 5, byrow =
> >> > T)
> >> > > > > >
> >> > > > > > dimnames(mat)<- list(c("site1", "site2", "site3"),
> >> > c("sp1", "sp2",
> >> > > > > > "sp3", "sp4", "sp5"))
> >> > > > > >
> >> > > > > > Data is
> >> > therefore continuous. I want to generate n random community
> >> > > > > > matrices
> >> > which both respect the following constraints:
> >> > > > > >
> >> > > > > > 1) row and
> >> > column totals are kept constant, such that "productivity" of
> >> > > > > > each
> >> > site is maintained, and that rare species at a "regional" level
> >> > > > > > stay
> >> > rare (and vice-versa).
> >> > > > > >
> >> > > > > > 2) number of species in each plot
> >> > is kept constant, i.e. each row
> >> > > > > > maintains the same number of zeros,
> >> > though these zeros should not stay
> >> > > > > > fixed.
> >> > > > > >
> >> > > > > > To
> >> > deal with continuous data, my initial idea was to transform the
> >> > > > > >
> >> > continuous data in mat to integer data by
> >> > > > > >
> >> > > > > > mat2<-
> >> > floor(mat * 100 / min(mat[mat>  0]) )
> >> > > > > >
> >> > > > > > where multiplying
> >> > by 100 is only used to reduce the effect of rounding
> >> > > > > > to nearest
> >> > integer (a bit arbitrary). In a way, shuffling mat could now
> >> > > > > > be seen
> >> > as re-allocating "units of biomass" randomly to plots. However,
> >> > > > > >
> >> > doing so results in a matrix with large number of "individuals" to
> >> > > > > >
> >> > reshuffle, which can slow things down quite a bit. But this is only part
> >> > > >
> >> > > > of the problem.
> >> > > > > >
> >> > > > > > My main problem has been to find an
> >> > algorithm that can actually respect
> >> > > > > > constraints 1 and 2. Despite
> >> > trying various R functions (r2dtable,
> >> > > > > > permatfull, etc), I have not
> >> > yet been able to find one that can do
> >> > > > > > this.
> >> > > > > >
> >> > > > > >
> >> > I've had some kind help from Peter Solymos who suggested that I try the
> >> > > >
> >> > > > aylmer package, and it's *almost* what I need, but the problem is that
> >> > >
> >> > > > > their algorithm does not allow for the zeros to move within the
> >> > matrix;
> >> > > > > > they stay fixed. I want the number of zeros to stay constant
> >> > within each
> >> > > > > > row, but I want them to move freely betweem columns.
> >> > >
> >> > > > >
> >> > > > > > Any help would be very much appreciated.
> >> > > > > >
> >> > > > > >
> >> > Thanks
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > --
> >> > > > > Dr. Carsten
> >> > F. Dormann
> >> > > > > Department of Computational Landscape Ecology
> >> > > > >
> >> > Helmholtz Centre for Environmental Research-UFZ
> >> > > > > Permoserstr. 15
> >> > > >
> >> > > 04318 Leipzig
> >> > > > > Germany
> >> > > > >
> >> > > > > Tel: ++49(0)341 2351946
> >> > > >
> >> > > Fax: ++49(0)341 2351939
> >> > > > > Email: carsten.dormann at ufz.de
> >> > > > >
> >> > internet: http://www.ufz.de/index.php?de=4205
> >> > > > >
> >> > > > >
> >> > _______________________________________________
> >> > > > > R-sig-ecology mailing
> >> > list
> >> > > > > R-sig-ecology at r-project.org
> >> > > > >
> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >> > > > >
> >> > > > >
> >> > > > >
> >> >
> >> > >
> >> > >
> >> > >
> >> >
> >> > --
> >> > Dr. Carsten F. Dormann
> >> > Department of
> >> > Computational Landscape Ecology
> >> > Helmholtz Centre for Environmental
> >> > Research-UFZ
> >> > Permoserstr. 15
> >> > 04318 Leipzig
> >> > Germany
> >> >
> >> > Tel: ++49(0)341
> >> > 2351946
> >> > Fax: ++49(0)341 2351939
> >> > Email: carsten.dormann at ufz.de
> >> > internet:
> >> > http://www.ufz.de/index.php?de=4205
> >>
> >>
> >
> >
> > --
> > Etienne Laliberté
> > ================================
> > School of Forestry
> > University of Canterbury
> > Private Bag 4800
> > Christchurch 8140, New Zealand
> > Phone: +64 3 366 7001 ext. 8365
> > Fax: +64 3 364 2124
> > www.elaliberte.info
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
> >

-- 
Etienne Laliberté
================================
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info