[R] fixed effects

Tue Mar 28 08:43:02 CEST 2006

There is a very similar example worked through in section 7.2 of `S 
Programming'.

On Mon, 27 Mar 2006, Liaw, Andy wrote:

> Hmm... didn't read the whole post before I hit `send'...
>
> I think you basically will have to fit the model `by hand', which is not
> that hard, given the simple structure of the model(s).  The formulae for the
> quantities of interests are quite straightforward and easy to code in R
> (similar to the rsq function I showed).  Adding a continuous variable to the
> model simply requires regressing the residuals; i.e., ave(y, x, function(x)
> x - mean(x)), on the new variable z (easy for lm()), and take one more
> degree of freedom from SS(total).
>
> Andy
>
> From: Liaw, Andy
>>
>> I guess you meant X has 20,000 levels (and 40 observations
>> each)?  In that case lm() will attempt to create a design
>> matrix that's 8e5 by 2e4, and that's unlikely to fit in the RAM.
>>
>> It is very easy to compute by hand.  I'm using a smaller data
>> size first to check the result against summary(lm()), then
>> the size you specified (hopefully with more correct
>> arithmetics...) to show that it's quite doable even on a
>> modest laptop:
>>
>>> set.seed(1)
>>> y <- rnorm(80)
>>> x <- factor(rep(paste("x", 1:20, sep=""), 4))
>>> summary(lm(y ~ x))$r.squared
>> [1] 0.2555885
>>> rsq <- function(x, y) {
>> +     sse <- sum(ave(y, x, FUN=function(x) x - mean(x))^2)
>> +     sstotal <- var(y) * (length(y) - 1)
>> +     1 - sse / sstotal
>> + }
>>> rsq(x, y)
>> [1] 0.2555885
>>> set.seed(1)
>>> y <- rnorm(8e5)
>>> x <- factor(rep(paste("x", 1:2e4, sep=""), 40))
>> system.time(rsq(x, y))
>> [1] 1.99 0.03 2.06   NA   NA
>>
>> Andy
>>
>>
>> From: ivo welch
>>>
>>> dear R wizards:
>>>
>>> X is factor with 20,000*20=800,000 observations of 20,000 factors.
>>> I.e., each factor has 20 observations.  y is 800,000 normally
>>> distributed data points.  I want to see how much R^2 the X
>>> factors can provide.  Easy, right?
>>>
>>>> lm ( y ~ X)
>>> and
>>>>  aov( y ~ X)
>>> Error: cannot allocate vector of size 3125000 Kb
>>>
>>> is this computationally infeasible?  (I am not an expert, but
>>> off-hand, I thought this can work as long as the X's are just
>>> factors = fitted means.)
>>>
>>>> help.search("fixed.effects");
>>>
>>> fixed.effects(nlme)                 Extract Fixed Effects
>>> fixed.effects.lmList(nlme)          Extract lmList Fixed Effects
>>> lme(nlme)                           Linear Mixed-Effects Models
>>> lmeStruct(nlme)                     Linear Mixed-Effects Structure
>>> nlme(nlme)                          Nonlinear Mixed-Effects Models
>>> nlmeStruct(nlme)                    Nonlinear Mixed-Effects
>> Structure
>>>
>>> Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
>>>
>>> ok---I want to read the fixed.effects help.  could the help
>>> system tell me how to inspect these entries?
>>>
>>>> help("fixed.effects")
>>> wrong
>>>> help.search("fixed.effects")
>>> two entries, as above.  nothing new.
>>>> ?fixed.effects
>>> wrong
>>>
>>> eventually, it dawned on me that nlme in parens was not a
>>> function argument, but the name of the package within which
>>> fixed.effects lives.  Suggestion:  maybe a different notation
>>> to name packages would be more intuitive in the help system.
>>> yes, I know it now, but other novices may not.  even a colon
>>> instead of a () may be more intuitive in this context.
>>>
>>>> library(nlme); ?lme
>>> and then
>>>> lme(y ~ X)
>>> Error in getGroups.data.frame(dataMix, groups) :
>>>         Invalid formula for groups
>>>
>>>
>>> now I have to beg for some help.  ok, blatant free-riding.
>>> the lme docs tells me it does the Laird and Ware model, but I
>>> do not know this model.  the only two examples given at the
>>> end of the lme help file seem to be similar to what I just
>>> specified.  so, how do I execute a simple fixed-effects
>>> model?  (later, I may want to add a variable Z that is a
>>> continuous random variable.)  could someone please give me
>>> one quick example?  help is, as always, highly appreciated.
>>>
>>> sincerely,
>>>
>>> /ivo welch
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>>
>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>> --------------------------------------------------------------
>> ----------------
>> Notice:  This e-mail message, together with any attachments,
>> contains information of Merck & Co., Inc. (One Merck Drive,
>> Whitehouse Station, New Jersey, USA 08889), and/or its
>> affiliates (which may be known outside the United States as
>> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
>> Banyu) that may be confidential, proprietary copyrighted
>> and/or legally privileged. It is intended solely for the use
>> of the individual or entity named on this message.  If you
>> are not the intended recipient, and have received this
>> message in error, please notify us immediately by reply
>> e-mail and then delete it from your system.
>> --------------------------------------------------------------
>> ----------------
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595