[R] bigglm() results different from glm()

Francisco J. Zagmutt gerifalte28 at hotmail.com
Tue Mar 17 03:26:29 CET 2009


Dear all,

I am using the bigglm package to fit a few GLM's to a large dataset (3 
million rows, 6 columns).  While trying to fit a Poisson GLM I noticed 
that the coefficient estimates were very different from what I obtained 
when estimating the model on a smaller dataset using glm(), I wrote a 
very basic toy example to compare the results of bigglm() against a 
glm() call.  Consider the following code:


 > require(biglm)
 > options(digits=6, scipen=3, contrasts = c("contr.treatment", 
"contr.poly"))
 > dat=data.frame(y =c(rpois(50000, 10),rpois(50000, 15)), 
ttment=gl(2,50000))
 > m1 <- glm(y~ttment, data=dat, family=poisson(link="log"))
 > m1big <- bigglm(y~ttment , data=dat, family=poisson(link="log"))
 > summary(m1)

<snipped output for this email>
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.30305    0.00141    1629   <2e-16 ***
ttment2      0.40429    0.00183     221   <2e-16 ***

     Null deviance: 151889  on 99999  degrees of freedom
Residual deviance: 101848  on 99998  degrees of freedom
AIC: 533152

 > summary(m1big)
Large data regression model: bigglm(y ~ ttment, data = dat, family = 
poisson(link = "log"))
Sample size =  100000
              Coef  (95%   CI)    SE p
(Intercept) 2.651 2.650 2.653 0.001 0
ttment2     4.346 4.344 4.348 0.001 0

 > m1big$deviance
[1] 287158986


Notice that the coefficients and deviance are quite different in the 
model estimated using bigglm(). If I change the chunk to 
seq(1000,10000,1000) the estimates remain the same.

Can someone help me understand what is causing these differences?

Here is my version info:

 > version
                _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          8.1
year           2008
month          12
day            22
svn rev        47281
language       R
version.string R version 2.8.1 (2008-12-22)


Many thanks in advance for your help,

Francisco

-- 
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
www.voseconsulting.com



More information about the R-help mailing list