[R] Tip: I() can designate constants in a regression
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Sep 8 19:23:00 CEST 2005
David James <djames at frontierassoc.com> writes:
> Just thought I would share a tip I learned:
> The function I() is useful for specifying constants to formulas and
> regressions.
>
> It will prevent nls (for example) from trying to treat the variable
> inside I() as something it needs to estimate. An example is below.
>
> -David
>
> P.S. This may be obvious to some, but it is not made clear to be by
> the documentation or common books that I reviewed. These books, of
> course, do tend to mention others aspects of I(), which seems to be a
> very diverse function. For example:
> * ISwR by Dalgaard (p. 160, 177)
> * MASwS by Venables and Ripley (p.18)
>
> However, the books I looked at do not mention the specific tip here:
> Wrapping I() around a variable will make it a constant from the
> perspective of a regression.
>
> A humble suggestion to the many authors of the many great R and S
> books out there: I would find it helpful if more R books had the word
> "constants" in the index. Perhaps there could be a brief section
> that explained how to create constants in a regression. These sorts
> of problems, I would guess, occur more commonly with nls models than
> lm models.
First check whether your claim is actually correct:
> x = 1:10
> y = x # perfect fit
> yeps = y + rnorm(length(y), sd = 0.01) # added noise
> nls(yeps ~ a + b*x, start = list(a = 0.12345, b = 0.54321),
+ trace = TRUE)
74.2686 : 0.12345 0.54321
0.0006529895 : -0.002666984 1.000334031
Nonlinear regression model
model: yeps ~ a + b * x
data: parent.frame()
a b
-0.002666984 1.000334031
residual sum-of-squares: 0.0006529895
> a <- 0
> nls(yeps ~ a + b*x, start = list(b = 0.54321),trace=TRUE)
80.31713 : 0.54321
0.0006682311 : 0.999953
Nonlinear regression model
model: yeps ~ a + b * x
data: parent.frame()
b
0.999953
residual sum-of-squares: 0.0006682311
I.e., turning a into a constant works quite happily without the I().
> Here is the example that motivated my tip:
>
> > weather.df : a data frame, where each row is one hour
> > weather.df$temp : the temperature
> > weather.df$annual : time offset, adjusted so that its period is one
> > year
> > weather.df$daily : time offset, adjusted so that its period is one day
> >
> > # I want a1,a2 to be constants from the point of view of nls
> > a1 <- 66
> > a2 <- -18
> > nls.example <- nls( temp ~ I(a1) + I(a2)*sin( ts.annual ) + a3*sin
> > ( ts.daily ), data=weather.df, start=c(a3=1) )
> > # leaving out the I() will cause nls to estimate values for a1 and a2
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list