[R] Regression and data types

Sat Sep 27 19:09:57 CEST 2008

On 27-Sep-08 15:18:32, Gavin Simpson wrote:
> On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote:
>> Dear All
>> I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were
>> generated in (different) R programs. All three vectors have one
>> column of 60 data points.
>> I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. 
> 
> Others have replied with an answer to your question. I just wanted
> to suggest you don't rummage around in R model objects taking what
> you like using '$'. 9 times out of 10 you'll get what you want, but
> that last remaining time will have you rubbing your head in confusion
> at best, at worst, answers may be just plain wrong.
> 
> Use extractor functions, such as coef() instead:
> 
> coef(lm(Y ~ X1))
> 
> G

Surely, if you read (for example) in ?lm that:

Value:
[...]
     An object of class '"lm"' is a list containing at least the
     following components:

     coefficients: a named vector of coefficients
     [...]

then (subject to using enough of the name to give a unique partial
matching, as is the case here) you should find that

  lm(...)$coef

returns what ?lm says!

Even with more complex model fitting, such as lme (in 'nlme') you
should still get what ?lme --> ?lmeObject says:

coefficients: a list with two components, 'fixed' and 'random', where
          the first is a vector containing the estimated fixed effects
          and the second is a list of matrices with the estimated
          random effects for each level of grouping. For each matrix in
          the 'random' list, the columns refer to the random effects
          and the rows to the groups.

Since coef() is a gneric function, you might expect coef(lme(...))
to give the same; or even perhaps give a more easily interpreted
presentation. But there's nothing wrong with lme(...)$coef provided
you're fully aware of what it is (as explained in the 'help')..

BUT: In the case of lme, if you run the example:

  fm1 <- lme(distance ~ age, data = Orthodont) # random is ~ age

then fm1$coef really does return a list with components:

  $fixed
  (Intercept)         age 
   16.7611111   0.6601852 

  $random
  $random$Subject
      (Intercept)          age
  M16  -0.1877576 -0.068853674
  [...]
  F04   1.0691628 -0.029862143
  F11   1.2176440  0.083191188

whereas if you do coef(fm1) you will get only:

      (Intercept)       age
  M16    16.57335 0.5913315
  [...]
  F04    17.83027 0.6303230
  F11    17.97876 0.7433764

so (a) not only do you NOT get the "fixed effects" part,
but (b) what you might interpret as the "random effects" part
has very different values from the "$random" component of
fm1$coef! Well, there must be some explanation for this, mustn't
there? But, in ?lmeObject, I find ONLY:

     Objects of this class have methods for the generic functions 
     'anova', 'coef', [...]

so I can't find out from there what coef(fm1) does; and if I look
in ?lme again I find only:

     The functions 'resid', 'coef',
     'fitted', 'fixed.effects', and 'random.effects'  can be used to
     extract some of its components.

so that doesn't tell me much (and certainly not why I get the
different results). And ?coef doesn't say anything specific either.

That being the case, I would, myself, stick with fm1$coef.
At least, then I know what I'm getting. With coef(fm1), I don't.

Of course, summary(fm1) is informative in its own way; but that's a
different matter; my point is that coef() doesn't necessarily give
you what you might expect, while (...)$coef does -- since it is
explained in full in ?lmeObject. This is, in fact, the opposite
conclusion to that drawn by Gavin!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Sep-08                                       Time: 18:09:54
------------------------------ XFMail ------------------------------