[R-sig-eco] Opinion on Analysis. glm() and nls()

Thu Jul 12 17:20:27 CEST 2012

Hello everybody.

I would like to get an opinian, if someone could spare a moment.
I'll first describe my question then show some code of the problem, if
someone could check for me.

My ideia is to measure how the Mass of frogs species increase the
parasites richness (Helminths). Big frogs have many parasites, little
ones have few.
I have 3 continues variables, Host species Mass mean, Helminths
species richness and number or studies used to get to this data.
As the firsts 2 are simple to think, the third is not.
The studies used are like, description of one parasite and its host,
but sometimes, many parasites and its hosts and still one host and its
parasites.
And usually it just tell the presence, no counts. But all the adult
parasite on final host.
The studies in the table are in the perspective of the host, the
number of studies that cite the host name, on these i got the
parasites names.
I know that more things are important for the parasites richness, like
variability in the mass, coevolution, habitat, but for this simple
question.

Relation of Helminths Richness ~ Host Mass
I thought about to use a glm with poisson error as the response as its a count.

But many species have the helminths richness subestimated due to low
number of studies on them (like 1 or 2).
I saw on some papers that people do a regression of Richness ~ Studies
and use the residuals to correlacionate with
others things. But studies dont add more richeness forever, some frogs
are more studied, but like a Species discovery curve,
studies stop to increase the helminths richness. I dont think i can
use a rarefaction or something on the package SPECIES
as i dont have helminths counts on each frog specie.
So i thought about useing a curve that represent the this. Reading
things i thought about using a michelis menten curve
to describe the Richness ~ Studies.

So Richness ~  c * Studies / ( d + Studies) + error

Where "d" would mean, biologicaly in my case, the contribution of the
studies and "c" the maximum species richness.
More studies = more richeness, until a moment that it would stop,
which d tell me when it will stop.
But c is what i expect on Helminths Richness ~ Host Mass, so i could
put this in the place of c
so

 Richness ~  ( exp(a+b*MASS) + error ) * Studies / ( d + Studies) +error

I see some problems with this, like i would expect that Studies
contribut equaly to species richeness, what should not be true, but
still appear better
than use a regression of Richness ~ Studies.

All said, my questions are.

01. Is this wrong? Why?
02. Are there other ways to try to analyse this case? How?

>From here on some examples in R.

#################################################################
#Look on what the real data look like
#################################################################

real.data<-structure(list(tamanho = c(31.5, 95, 79.5, 48.5, 53.5, 45.5,
173.5, 88.5, 140, 80.5, 82, 64, 57.5, 62.5, 21.5, 47.5, 109.5,
95, 34.5, 57.5, 55, 26, 35.5, 39.5, 48.5, 34.5, 72, 50, 83, 49,
24, 53, 71, 46, 46, 145, 41.5, 51.5, 62.5, 134, 37.5, 76, 70,
61, 91, 18), estudos = c(3L, 1L, 28L, 3L, 13L, 2L, 21L, 1L, 22L,
11L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 7L,
1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 4L, 30L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 8L), helmintos = c(5L, 1L, 19L, 14L,
5L, 4L, 18L, 1L, 13L, 3L, 3L, 6L, 4L, 2L, 2L, 4L, 1L, 7L, 1L,
1L, 1L, 1L, 1L, 5L, 2L, 1L, 3L, 1L, 2L, 3L, 4L, 1L, 7L, 6L, 13L,
18L, 3L, 8L, 5L, 2L, 17L, 3L, 6L, 1L, 1L, 7L)), .Names = c("Mass",
"Studies", "Helminths"), row.names = c(1L, 2L, 3L, 4L, 5L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L), class = "data.frame")
real.data

plot(Helminths~Mass,data=real.data,cex=log2(real.data$Studies+1),
xlab="Host species mean Mass",ylab="Helminths
Richness",xlim=c(0,200),ylim=c(0,50))
legend("topleft",legend=c("Point size are log2 of the number of
studies on host species +1"))

#################################################################
#Example Helminths Richness ~ Host Mass
#################################################################
#example data
Mass<- runif(50, 18, 174)
#parameters
a<-2
b<-0.004
#richeness
richeness<-round(exp(a+b*Mass)+rnorm(length(Mass),0,1),0)
#graph
plot(richeness~Mass,xlab="Mass",ylab="Richeness")
curve(exp(a+b*x),0,200,add=T)
#model fit
model.mass<-glm(richeness~Mass,poisson)
summary(model.mass)

points(exp(c(predict(model.mass)))[order(exp(c(predict(model.mass))))]~Mass[order(Mass)],type="l",lty=2,col="red")
legend("topleft",col=c("black","red"),lty=c(1,2),legend=c("True","GLM fit"))

#################################################################
#Example Richness ~ Studies
#################################################################
#example data
Studies<- rpois(50,5)
#parameters c is just ilustrative here, suposse that all species have
#the same Mass
c<-10
d<-2.5
richeness<-round((c*Studies/(d+Studies))+rnorm(length(Studies),0,1),0)

plot(richeness~Studies,ylab="Richeness",xlab="Studies")
curve(c*x/(d+x),0,15,add=T,col="black")

model.menten<- glm(richeness~c(1/Studies),family=gaussian(link = "inverse"))
summary(model.menten)
c1<-1/model.menten$coef[1]
d1<-c1*model.menten$coef[2]
c1
d1
curve(c1*x/(d1+x),0,15,add=T,col="red",lty=2)
legend("topleft",col=c("black","red"),lty=c(1,2),legend=c("True","GLM fit"))

#################################################################
#Example Richness ~ Mass and Studies
#################################################################
Mass<- runif(50, 18, 174)
a<-2
b<-0.004

#Trying to copy what the data look like
Studies<- rpois(50,1)+1
d<-3

richeness<-round((((exp(a+b*Mass)+rnorm(length(Mass),0,1))*Studies)/(d+Studies))+rnorm(length(Studies),0,1),0)

plot(richeness~Mass,cex=log2(Studies+1),xlab="Host species mean
Mass",ylab="Helminths Richness",
xlim=c(0,200),ylim=c(0,15))
legend("topleft",legend=c("Point size are log2 of the number of
studies on host species +1"))
#depending on the set.seed, this plot resemble the real data
#but i dont know if i'm fooling myself
#also the Mass is not uniform, its more commom
#small species than big ones

#trying to recover the parameters with nls
modelo.nls<-nls(richeness~exp(a+b*Mass)*Studies/(d+Studies),start=list(a=1.5,b=0.01,d=2))
summary(modelo.nls)
#most times recover the parameters fairly well

#trying to do the samething with winbugs
library(R2WinBUGS)

#Model
sink("test.model.txt")
cat("
model {

# Priors
alpha ~ dnorm(0,0.001)
beta ~ dnorm(0,0.001)
gamma ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    richeness[i] ~ dpois(lambda[i])
    lambda[i] <- exp(alpha + beta*Mass[i])*Studies[i] / ( gamma + Studies[i] )
    }
}
",fill=TRUE)
sink()

win.data <- list(richeness=richeness,Studies=Studies,Mass=Mass,n=length(Mass))
inits <- function(){ list( alpha=rlnorm(1), beta=rlnorm(1), gamma=rlnorm(1)) }
params <- c("alpha","beta","gamma")

# MCMC settings
nc <- 3
ni <- 3000
nb <- 1000
nt <- 2

#run
out <- bugs(data=win.data, inits=inits, parameters.to.save=params,
model.file="test.model.txt", n.thin=nt, n.chains=nc, n.burnin=nb,
n.iter=ni, debug = TRUE)

#look
print(out, dig = 3)
#probably i still dont understand everything i'm doing here
#but how could i put on the model 2 errors, like when i
#assemble the data?
#could not make that neither for bayesian norlikehood way

If you read this far. thanks for your patiance.
Best wishes
Augusto Ribas

-- 
Grato
Augusto C. A. Ribas

Site Pessoal: http://augustoribas.heliohost.org
Lattes: http://lattes.cnpq.br/7355685961127056