[R] glm or transformation of the response?

joris meys jorismeys at gmail.com
Tue Nov 25 16:29:04 CET 2008


I'm not sure on what kind of dataset would be most appropriate, but
following code I used to  create a dataset with a linear response and
an increasing variance (the megaphone type, common in ecological
datasets if I'm right) :

beta0 <- 10
beta1 <- 1
x <- c(1:40)
y <- beta0 + x*beta1 +rnorm(40,0,1)*seq(0.1,10,length.out=40)

Data=data.frame(x,y)

I won't win the price for most elegant programming with this, but it
surely works fine for my simulations.

Kind regards
Joris

On Tue, Nov 25, 2008 at 3:52 PM, Christoph Scherber
<Christoph.Scherber at agr.uni-goettingen.de> wrote:
> Dear all,
>
> For an introductory course on glm?s I would like to create an example to
> show the difference between glm and transformation of the response. For
> this, I tried to create a dataset where the variance increases with the mean
> (as is the case in many ecological datasets):
>
>        poissondata=data.frame(
>        response=rpois(40,1:40),
>        explanatory=1:40)
>
>        attach(poissondata)
>
> However, I have run into a problem because it looks like the lm model (with
> sqrt-transformation) fits the data best:
>
> ##
>
> model1=lm(response~explanatory,poissondata)
> model2=lm(sqrt(response+0.5)~explanatory,poissondata)
> model3=lm(log(response+1)~explanatory,poissondata)
> model4=glm(response~explanatory,poissondata,family=poisson)
> model5=glm(response~explanatory,poissondata,family=quasipoisson)
> model6=glm.nb(response~explanatory,poissondata)
> model7=glm(response~explanatory,quasi(variance="mu",link="identity"))
>
>
> plot(explanatory,response,pch=16)
> lines(explanatory,predict(model1,explanatory=explanatory))
> lines(explanatory,(predict(model2,explanatory=explanatory))^2-0.5,lty=2)
> lines(explanatory,exp(predict(model3,explanatory=explanatory))-1,lty=3)
> lines(explanatory,exp(predict(model5,explanatory=explanatory)),lty=1,col="red")
> lines(explanatory,predict(model6,explanatory=explanatory,type="response"),lty=1,col="blue")
> lines(explanatory,predict(model7,explanatory=explanatory,type="response"),lty=1,col="green")
>
> ##
>
> The only model that performs equally well is model7.
>
> How would you deal with this kind of analysis? What would be your
> recommendation to the students, given the fact that most of the standard glm
> models obviously don?t seem to produce good fits here?
>
> Many thanks and best wishes
> Christoph
>
> (using R 2.8.0 on Windows XP)
>
>
>
>
>
> --
> Dr. rer.nat. Christoph Scherber
> University of Goettingen
> DNPW, Agroecology
> Waldweg 26
> D-37073 Goettingen
> Germany
>
> phone +49 (0)551 39 8807
> fax   +49 (0)551 39 8806
>
> Homepage http://www.gwdg.de/~cscherb1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list