[R-sig-eco] null models with continuous abundance data

Thu Jan 7 20:48:08 CET 2010

Many thanks again Carsten.

Yes, you're right that care must be taken to ensure that a decent number
of unique random matrices must be obtained. I don't think it would be a
problem in my case given that transforming my continuous abundance data
to count by

mat2<- floor(mat * 100 / min(mat[mat >  0]) )

can quickly lead to a very large number of individuals (hundreds of
thousands if not > million in some cases). The issue then becomes mostly
one of computing speed, but that's another story.

Following your suggestion, I came up with a simple (read naïve) R
algorithm that starts with a binary matrix obtained by commsimulator
then fills it randomly. It seems to work relatively well, though it's
extremely slow. 

In addition, because the algorithm fills the matrix one individual at a
time, I sometimes ran into the problem of one individual which had
nowhere to go because "ressources" at the site (i.e. the row sum) was
already full. In that case, this individual disappears, i.e. goes
nowhere. This is a bit drastic and clearly sub-optimal, but it's the
only way I could quickly think of yesterday to prevent the algorithm
from getting stuck. The consequence is that often, the total number of
individuals in the null matrix is a bit less than total number of
individuals in the observed matrix. I don't see this as being a real
problem with the large matrices I'll deal with though, but definitely
something to be aware of.

If you're curious, here's the simple algorithm, as well as some code to
run a test below. I'd be very interested to compare it to Vazquez's
algorithm!

Thanks again

Etienne

###

nullabun <- function(x){
   require(vegan)
   nsites <- nrow(x)
   nsp <- ncol(x)
   rowsum <- rowSums(x)
   colsum <- colSums(x)
   nullbin <- commsimulator(x, method = "quasiswap")
   rownull <- rowSums(nullbin)
   colnull <- colSums(nullbin)
   rowsum <- rowsum - rownull
   colsum <- colsum - colnull
   ind <- rep(1:nsp, colsum)
   ress <- rep(1:nsites, rowsum)
   total <- sum(rowsum)
   selinds <- sample(1:total)
      for (j in 1:total){
         indsel <- ind[selinds[j]]
         sitepot <- which(nullbin[, indsel] > 0)
         if (length(ress[ress %in% sitepot]) > 0 ){
            sitesel <- sample(ress[ress %in% sitepot], 1)
            ress <- ress[-which(ress == sitesel)[1] ]
            nullbin[sitesel, indsel] <- nullbin[sitesel, indsel] + 1
           }
      }
return(nullbin)
}

### now let's test the function

# create a dummy abundance matrix
m <- matrix(c(1,3,2,0,3,1,0,2,1,0,2,1,0,0,1,2,0,3,0,0,0,1,4,3), 4, 6,
byrow=TRUE)

# generate a null matrix
m.null <- nullabun(m)

# compare total number of individuals
sum(m) #30
sum(m.null) #29: one individual got "stuck" with no ressources...

# to generate more than one null matrix, say 999
nulls <- replicate(n = 999, nullabun(m), simplify = F)

# how many unique null matrices?
length(unique(nulls) ) # I found 983 out of 999

# how many individuals in m?
sum(m) # there are 30

# how bad is the problem of individuals getting stuck with no ressources
in null matrices?
sums <- as.numeric(lapply(nulls, sum, simplify = T))
hist(sums) # the vast majority had 29 or 30 individuals

Le jeudi 07 janvier 2010 à 10:34 +0100, Carsten Dormann a écrit :
> Hi Etienne,
> 
> I'm afraid that swap.web cannot easily accommodate this constraint.
> Diego Vazquez has used an alternative approach for this problem, but I
> haven't seen code for it (it's briefly described in his Oikos 2005
> paper). While swap.web starts with a "too-full" matrix and then
> downsamples, Diego starts by allocating bottom-up. There it should be
> relatively easy to also constrain the row-constancy of connectance.
> 
> I can't promise, but I will hopefully have this algorithm by the end
> of next week (I'm teaching this stuff next week and this is one of the
> assignments). If so, I'll get it over to you.
> 
> The problem that already arises with swap.web for very sparse matrices
> is that there are very few unique combinations left after all these
> constraints. This will be even worse for your approach, and plant
> community data are usually VERY sparse. Once you have produced the
> null models, you should assure yourself that there are hundreds
> (better thousands) of possible randomised matrices. Adding just one
> more constraint to your null model (that of column AND row constancy
> of 0s) will uniquely define the matrix! Referees may not pick it up,
> but it may give you trivial results.
> 
> Best wishes,
> 
> Carsten
> 
> 
> Vázquez, D. P., Melián, C. J., Williams, N. M., Blüthgen, N., Krasnov,
> B. R., Poulin, R., et al. (2007). Species abundance and asymmetric
> interaction strength in ecological networks. Oikos, 116, 1120-1127.
> 
> On 06/01/2010 23:57, Etienne Laliberté wrote: 
> > Many thanks Carsten and Peter for your suggestions.
> > 
> > commsimulator indeed respects the two contraints I'm interested in, but
> > only allows for binary data.
> > 
> > swap.web is *almost* what I need, but only overall matrix fill is kept
> > constant, whereas I want zeros to move only between rows, not between
> > both columns and rows. In others words, if the initial data matrix had
> > three zeros for row 1, permuted matrices should also have three zeros
> > for that row.
> > 
> > I do not doubt that Peter's suggestions are good, but I'm afraid they
> > seem a bit overly complicated for my particular problem. All I'm after
> > is to create n randomly-assembled matrices from an observed species
> > abundance matrix to compare the observed functional diversity of the
> > sampling sites to a null expectation. To be conservative, this requires
> > that I hold species richness constant at each site, and keep row and
> > column marginals fixed.
> > 
> > Could swap.web or permatswap(..., method = "quasiswap") be easily
> > tweaked to accomodate this? The only difference really is that matrix
> > fill should be kept constant *but also* be constrained within rows.
> > 
> > Thanks again for your help.
> > 
> > Etienne
> > 
> > Le 7 janvier 2010 08:17, Peter Solymos <solymos at ualberta.ca> a écrit :
> >   
> > > Dear Etienne,
> > > 
> > > You can try the Chris Hennig's prablus package which have a parametric
> > > bootstrap based null-model where clumpedness of occurrences or
> > > abundances (this might allow continuous data, too) is estimated from
> > > the site-species matrix and used in the null-model generation. But
> > > here, the sum of the matrix will vary randomly.
> > > 
> > > But if you have environmental covariates, you might try something more
> > > parametric. For example the simulate.rda or simulate.cca functions in
> > > the vegan package, or fit multivariate LM for nested models (i.e.
> > > intercept only, and with other covariates) and compare AIC's, or use
> > > the simulate.lm to get random numbers based on the fitted model. This
> > > way you can base you desired statistic on the simulated data sets, and
> > > you know explicitly what is the model (plus it is good for continuous
> > > data that you have). By using the null-model approach, you implicitly
> > > have a model by defining constraints for the permutations, and
> > > p-values are probabilities of the data given the constraints (null
> > > hypothesis), and not probability of the null hypothesis given the data
> > > (what people usually really want).
> > > 
> > > Cheers,
> > > 
> > > Peter
> > > 
> > > Péter Sólymos
> > > Alberta Biodiversity Monitoring Institute
> > > Department of Biological Sciences
> > > CW 405, Biological Sciences Bldg
> > > University of Alberta
> > > Edmonton, Alberta, T6G 2E9, Canada
> > > Phone: 780.492.8534
> > > Fax: 780.492.7635
> > > 
> > > 
> > > 
> > > On Wed, Jan 6, 2010 at 2:18 AM, Carsten Dormann <carsten.dormann at ufz.de> wrote:
> > >     
> > > > Hi Etienne,
> > > > 
> > > > the double constraint is observed by two functions:
> > > > 
> > > > swap.web in package bipartite
> > > > 
> > > > and
> > > > 
> > > > commsimulator in vegan (at least in the r-forge version)
> > > > 
> > > > Both build on the r2dtable approach, i.e. you have, as you propose, to turn
> > > > the low values into higher-value integers.
> > > > 
> > > > The algorithm is described in the help to swap.web.
> > > > 
> > > > 
> > > > HTH,
> > > > 
> > > > Carsten
> > > > 
> > > > 
> > > > 
> > > > On 06/01/2010 08:55, Etienne Laliberté wrote:
> > > >       
> > > > > Hi,
> > > > > 
> > > > > Let's say I have measured plant biomass for a total of 5 species from 3
> > > > > sites (i.e. plots), such that I end with the following data matrix
> > > > > 
> > > > > mat<- matrix(c(0.35, 0.12, 0.61, 0, 0, 0.28, 0, 0.42, 0.31, 0.19, 0.82,
> > > > > 0, 0, 0, 0.25), 3, 5, byrow = T)
> > > > > 
> > > > > dimnames(mat)<- list(c("site1", "site2", "site3"), c("sp1", "sp2",
> > > > > "sp3", "sp4", "sp5"))
> > > > > 
> > > > > Data is therefore continuous. I want to generate n random community
> > > > > matrices which both respect the following constraints:
> > > > > 
> > > > > 1) row and column totals are kept constant, such that "productivity" of
> > > > > each site is maintained, and that rare species at a "regional" level
> > > > > stay rare (and vice-versa).
> > > > > 
> > > > > 2) number of species in each plot is kept constant, i.e. each row
> > > > > maintains the same number of zeros, though these zeros should not stay
> > > > > fixed.
> > > > > 
> > > > > To deal with continuous data, my initial idea was to transform the
> > > > > continuous data in mat to integer data by
> > > > > 
> > > > > mat2<- floor(mat * 100 / min(mat[mat>  0]) )
> > > > > 
> > > > > where multiplying by 100 is only used to reduce the effect of rounding
> > > > > to nearest integer (a bit arbitrary). In a way, shuffling mat could now
> > > > > be seen as re-allocating "units of biomass" randomly to plots. However,
> > > > > doing so results in a matrix with large number of "individuals" to
> > > > > reshuffle, which can slow things down quite a bit. But this is only part
> > > > > of the problem.
> > > > > 
> > > > > My main problem has been to find an algorithm that can actually respect
> > > > > constraints 1 and 2. Despite trying various R functions (r2dtable,
> > > > > permatfull, etc), I have not yet been able to find one that can do
> > > > > this.
> > > > > 
> > > > > I've had some kind help from Peter Solymos who suggested that I try the
> > > > > aylmer package, and it's *almost* what I need, but the problem is that
> > > > > their algorithm does not allow for the zeros to move within the matrix;
> > > > > they stay fixed. I want the number of zeros to stay constant within each
> > > > > row, but I want them to move freely betweem columns.
> > > > > 
> > > > > Any help would be very much appreciated.
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > >         
> > > > --
> > > > Dr. Carsten F. Dormann
> > > > Department of Computational Landscape Ecology
> > > > Helmholtz Centre for Environmental Research-UFZ
> > > > Permoserstr. 15
> > > > 04318 Leipzig
> > > > Germany
> > > > 
> > > > Tel: ++49(0)341 2351946
> > > > Fax: ++49(0)341 2351939
> > > > Email: carsten.dormann at ufz.de
> > > > internet: http://www.ufz.de/index.php?de=4205
> > > > 
> > > > _______________________________________________
> > > > R-sig-ecology mailing list
> > > > R-sig-ecology at r-project.org
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > > > 
> > > > 
> > > >       
> > 
> > 
> >   
> 
> -- 
> Dr. Carsten F. Dormann
> Department of Computational Landscape Ecology
> Helmholtz Centre for Environmental Research-UFZ
> Permoserstr. 15
> 04318 Leipzig
> Germany
> 
> Tel: ++49(0)341 2351946
> Fax: ++49(0)341 2351939
> Email: carsten.dormann at ufz.de
> internet: http://www.ufz.de/index.php?de=4205

-- 
Etienne Laliberté
================================
School of Forestry
University of Canterbury
Private Bag 4800
Christchurch 8140, New Zealand
Phone: +64 3 366 7001 ext. 8365
Fax: +64 3 364 2124
www.elaliberte.info