[R] bigglm() results different from glm()
John Fox
jfox at mcmaster.ca
Tue Mar 17 14:41:59 CET 2009
Dear Francisco,
I was able to duplicate the problem that you reported, and in addition
discovered that the problem seems to be peculiar to the poisson family.
lm(y~ttment, data=dat) and biglm(y~ttment, data=dat) produce identical
results, as do glm(y~ttment, data=dat) and bigglm(y~ttment, data=dat).
Another example, with the binomial family:
> set.seed(12345)
> dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
ttment=gl(2,50000))
> dat$y0 <- ifelse(dat$y > 12, 1, 0)
> m1b <- glm(y0~ttment, data=dat, family=binomial)
> m1bigb <- bigglm(y0~ttment , data=dat, family=binomial())
> coef(m1b)
(Intercept) ttment2
-1.33508 2.34263
> coef(m1bigb)
(Intercept) ttment2
-1.33508 2.34263
> deviance(m1b)
[1] 109244
> deviance(m1bigb)
[1] 109244
I'm copying this message to Thomas Lumley, who's the author and maintainer
of the biglm package.
Regards,
John
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Francisco J. Zagmutt
> Sent: March-16-09 10:26 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] bigglm() results different from glm()
>
> Dear all,
>
> I am using the bigglm package to fit a few GLM's to a large dataset (3
> million rows, 6 columns). While trying to fit a Poisson GLM I noticed
> that the coefficient estimates were very different from what I obtained
> when estimating the model on a smaller dataset using glm(), I wrote a
> very basic toy example to compare the results of bigglm() against a
> glm() call. Consider the following code:
>
>
> > require(biglm)
> > options(digits=6, scipen=3, contrasts = c("contr.treatment",
> "contr.poly"))
> > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)),
> ttment=gl(2,50000))
> > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
> > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
> > summary(m1)
>
> <snipped output for this email>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 2.30305 0.00141 1629 <2e-16 ***
> ttment2 0.40429 0.00183 221 <2e-16 ***
>
> Null deviance: 151889 on 99999 degrees of freedom
> Residual deviance: 101848 on 99998 degrees of freedom
> AIC: 533152
>
> > summary(m1big)
> Large data regression model: bigglm(y ~ ttment, data = dat, family =
> poisson(link = "log"))
> Sample size = 100000
> Coef (95% CI) SE p
> (Intercept) 2.651 2.650 2.653 0.001 0
> ttment2 4.346 4.344 4.348 0.001 0
>
> > m1big$deviance
> [1] 287158986
>
>
> Notice that the coefficients and deviance are quite different in the
> model estimated using bigglm(). If I change the chunk to
> seq(1000,10000,1000) the estimates remain the same.
>
> Can someone help me understand what is causing these differences?
>
> Here is my version info:
>
> > version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status
> major 2
> minor 8.1
> year 2008
> month 12
> day 22
> svn rev 47281
> language R
> version.string R version 2.8.1 (2008-12-22)
>
>
> Many thanks in advance for your help,
>
> Francisco
>
> --
> Francisco J. Zagmutt
> Vose Consulting
> 2891 20th Street
> Boulder, CO, 80304
> USA
> www.voseconsulting.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list