[R] Am failing on making lagged residual after regression
Ajay Shah
ajayshah at mayin.org
Mon Mar 8 17:37:15 CET 2004
Folks,
I'm most confused in trying to do something that (I thought) out to be
mainstream and straightforward R. :-) Could you please help?
I am doing an ordinary linear regression. My goal is: After a
regression, to make residuals, and make a new variable which is the
lagged residuals (lagged by 1). I will use this variable in a 2nd
stage regression (for an error-correcting model).
This sounds simple and reasonable, and should be right up R's alley,
but I am just not able to do this. Can I please show you the steps
which I'm trying and failing in?
I start with:
> m = lm(NNDA ~ NFA + NFA.x.d1 + NFA.x.d2 + IIP.n + CRR, D.f)
> e = residuals(m)
> print(e)
34 35 36 37 38 39
-5073.24843 -4210.27886 -8218.01782 -1489.10583 -4426.11738 -11332.56052
(lines deleted)
64 65 66 67 68 69
8362.93776 7564.14324 2311.41208 7660.00638 -1271.04645 -10917.29418
(lines deleted)
160 161 162 163 164 165
3858.94591 -11783.04370 -21438.33646 1859.49628 -4988.82853 -25172.43241
Here, the residuals only started at the 34th observation owing to
missing data in my data frame. This is correct and sensible. The
dataset is 167 observations, but 166 and 167 are also missing data and
dropped.
I tried to use lag(e,1) to make a new vector and failed. I think I am
just not understanding the R concept of lag(). In my notion of a
lagged vector, I want a vector f where f[35] is e[34], i.e. is the
first residual above of -5073.24843. This is just not what I get by
saying lag(e,1) - I am just not understanding lag(). I would be very
happy if someone could educate me on how to utilise lag().
Okay, I try to get my way in a different way:
> print(T)
[1] 167
> f = numeric(T)
> f[1] = NA
> f[2:T] = e[1:(T-1)]
This looks reasonable? I thought this should do the trick. I am
hand-initialising a T-length vector with NA in the 1st elem, and I
copy out the values of e[] from 1 till 166 into f[2:T]. I thought this
should give me a lagged e. It doesn't --
> print(f)
[1] NA -5073.24843 -4210.27886 -8218.01782 -1489.10583
(lines deleted)
[131] 1859.49628 -4988.82853 -25172.43241 NA NA
(lines deleted)
[166] NA NA
I thought "Okay, what seems to be happening is that the e[1] that I
have is `actually' the e[34] of my thoughts". So I try:
> f=rep(NA, T) # zap out f
> f[35:T] = e[34:(T-1)] # copy out useful stuff into 35..T
> print(f)
[1] NA NA NA NA NA
(lines deleted)
[31] NA NA NA NA 7660.00638
[36] -1271.04645 -10917.29418 -11111.60144 -1597.98355 -1066.01901
(lines deleted)
[131] 1859.49628 -4988.82853 -25172.43241 NA NA
(lines deleted)
[166] NA NA
This is wrong!!
Recall (from upstairs) that e[34] was -5073.24843. That value seems to
have mysteriously vanished. Instead, the first non-NA in f - which is
f[35] - is 7660.00638, which (incidentally) was e[67]. I just don't
know how that value got here. And, the values in f[] seem to peter out
at 133! After 133, they are all NA until the end.
I guess I'm _just_ not understanding what is the animal that is
returned by residual(lm()). I know I am missing something basic,
because lots of people must be doing what I am trying: I.e. to run a
regression, extract a residual, lag it, and use it for a 2nd stage
regression.
I know that the vector e (returned by residual(lm())) is different
from a simple vector, for when I say:
> print(f[35])
[1] 7660.006
> print(e[35])
68
-1271.046
the two animals seem to be different. I don't understand e[35] - why
is it not just a number - there seems to be some index tagging along?
How do I get at the pure numbers of the residuals?
Thanks much,
-ans.
--
Ajay Shah Consultant
ajayshah at mayin.org Department of Economic Affairs
http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
More information about the R-help
mailing list