[R] gee

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun May 11 22:46:15 CEST 2003


On Sun, 11 May 2003, Nirmala Ravishankar wrote:

> I am trying to use gee() to calculate the robust standard errors for a 
> logit model.  My dataset (zol) has 195019 observations; winner, racebl, 
> raceas, racehi are all binary variables.   ID is saved as a vector of 
> length 195019 with alternating 0's and 1's.   I get the following error 
> message.  I also tried the same command with corstr set to "independence" 
> and got the same error message.
>  
> 
> > ID <- as.vector(array(0, nrow(zol)))
> > k <- seq(2, nrow(zol), 2)
> > ID[k] <- 1
> 
> 
> > fm <- gee(winner ~ racebl + racehi + raceas, id = ID, data = zol, family 
> = binomial(logit), corstr = "exchangeable")
> [1] "Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27"
> [1] "running glm to get initial regression estimate"
> [1]  0.4308219 -0.1929547 -0.1741733 -0.1925523
> Error in rep(0, maxclsz * maxclsz) : invalid number of copies in "rep"
> In addition: Warning message: 
> NAs produced by integer overflow in: maxclsz * maxclsz 
> 
> 
> 
> What am I doing wrong?

Using a much larger dataset that the author of gee envisaged: the warning
message is pretty explicit. Not that I think you will get clusters of size
1e5 to work, since rep(0, maxclsz * maxclsz) is a vector of about 80Gb,
and on a 32-bit machine the OS can only address 4Gb at most per process.

I cannot imagine a real statistical problem with a homogeneous group of
1e5 observations, but if you have one, a 1% subsample ought to suffice for
all practical purposes.  And any statistical fluctuations (variance)
will be swamped by model inadequacy (bias) for 2e5 binary observations.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list