[R-sig-ME] poly() and rcs()?

Sun Nov 15 21:20:07 CET 2009

Dear all,

Here is a question regarding the difference between using poly() and rcs().

If you fit a model using poly(), as shown below, the summary returns
statistics for the term "linear" term,
poly(TrialContinuous,2,raw=TRUE)1, and statistics for the quadratic,
poly(TrialContinuous,2,raw=TRUE)2. Here, the summary tells us that the
"linear" and quadratic terms are significant, that the interaction
between the linear term and the factor variable FreqGroup is
significant, but that the interaction between the quadratic and
FreqGroup is not significant.

> m1=lmer(LogRT~poly(TrialContinuous,2,raw=TRUE)*FreqGroup+(1|Subject),data=dat.li)
> print(m1,corr=F)
Linear mixed model fit by REML
Formula: LogRT ~ poly(TrialContinuous, 2, raw = TRUE) * FreqGroup + (1
|      Subject)
   Data: dat.li
   AIC   BIC     logLik deviance REMLdev
 17199 17266  -8592    17056   17183
Random effects:
 Groups   Name        Variance   Std.Dev.
 Subject  (Intercept)   0.014825  0.12176
 Residual                  0.101560  0.31868
Number of obs: 30742, groups: Subject, 25

Fixed effects:

         Estimate    Std. Error    t value
(Intercept)
     6.307e+00  2.456e-02    256.85
poly(TrialContinuous, 2, raw = TRUE)1                       -1.570e-04
  4.639e-06   -33.84
poly(TrialContinuous, 2, raw = TRUE)2                        9.779e-08
  1.120e-08    8.73
FreqGroupLow
3.299e-02   6.085e-03    5.42
poly(TrialContinuous, 2, raw = TRUE)1:FreqGroupLow  1.769e-05
8.736e-06    2.03
poly(TrialContinuous, 2, raw = TRUE)2:FreqGroupLow -1.168e-08
2.123e-08   -0.55

Now, if you use rcs() instead, the summary return statistics for the
first and second splines, as shown below. We see that the 2 splines
are significant, but there is no interaction between any of the
splines and FreqGroup. This reflects what we see in the model fitted
with poly(), where the the linear and quadratic terms were significant
but the quadratic by FreqGroup interaction was not significant. What's
missing in the model fitted with rcs() is the linear term.

> m2=lmer(LogRT~rcs(TrialContinuous,3)*FreqGroup+(1|Subject),data=dat.li)
> print(m2,corr=F)
Linear mixed model fit by REML
Formula: LogRT ~ rcs(TrialContinuous, 3) * FreqGroup + (1 | Subject)
   Data: dat.li
   AIC   BIC     logLik deviance REMLdev
 17180 17247  -8582    17065   17164
Random effects:
 Groups   Name        Variance   Std.Dev.
 Subject  (Intercept)  0.014826   0.12176
 Residual                 0.101592   0.31874
Number of obs: 30742, groups: Subject, 25

Fixed effects:

         Estimate    Std. Error    t value
(Intercept)
     6.286e+00  2.498e-02    251.62
rcs(TrialContinuous, 3)TrialContinuous
-2.538e-04  1.253e-05   -20.26
rcs(TrialContinuous, 3)TrialContinuous'
1.285e-04   1.547e-05    8.31
FreqGroupLow
3.599e-02   1.061e-02    3.39
rcs(TrialContinuous, 3)TrialContinuous:FreqGroupLow   3.018e-05
2.379e-05    1.27
rcs(TrialContinuous, 3)TrialContinuous':FreqGroupLow -1.660e-05
2.936e-05   -0.57

I tried adding a linear term "TrialContinuous" to the model fitted
with rcs(), as shown below, but that doesn't work, the model is not
positive definite.

> m3=lmer(LogRT ~ (TrialContinuous + rcs(TrialContinuous,3)) * FreqGroup + (1|Subject), data = dat.li)
> Error in mer_finalize(ans) : Downdated X'X is not positive definite, 7.

The question is where is the linear term (if there is one)?
Is it hidden somewhere? Or is it simply not a question to ask when
using rcs() to fit models?
Should I use poly() rather than rcs()?
Are there situations where you would want to use poly() over rcs() and
vice versa?

Additionally, how should we interpret the statistics for the splines?
Does their significance mean that the slopes in the splines are
significantly different than 0?
In the case of the second spline, does it's significance mean that it
is significantly different than the first spline?

Thank you very much for your time and efforts. Your help is greatly appreciated.
Sincerely

--
Antoine Tremblay
Department of Neuroscience
Georgetown University
Washington DC