[R] New variables "remember" how they were created?

Skipper Seabold jsseabold at gmail.com
Thu Oct 29 17:06:09 CET 2009

On Wed, Oct 28, 2009 at 12:40 PM, Adaikalavan Ramasamy
<a.ramasamy at imperial.ac.uk> wrote:
> Your example is too complicated for me. But few points:
> 1) What do you mean by "instrument"? Do you mean variable?

By instruments, I mean instrumental variables.  Very common in
econometrics: http://en.wikipedia.org/wiki/Instrumental_variable

Doing two-stage least squares in this example requires using
instrumental variables.

> 2) diff(demand) is identical to demand[-1] - demand[-204]

Just trying to be explicit with the use of the lag and then including
it as an instrument.

> 3) system() is a built-in R function, so avoid using it as variable name

Ok thanks.

> 4) The variable "yd" is in the eqInvest formula and subsequently to the
> system formula. The variable "y.1" is in the instruments formula. Both
> formulas are passed onto systemfit() call. Thus I see no surprises here.
> Try simplifying and rephrasing please if you want further help.

Ok, here it goes.

This is an introductory example, so I'm not sure how much more I can
simplify it, and the details of the estimator aren't that important.
Everything here works, and I understand what's going on, but I just
wonder how R knows that yd was creating using y.1.

In the second stage of the fit, the endogenous regressors on the RHS
are replaced by the fitted values from the first stage found from
regressing these endogenous regressors on all of the instrumental
variables (these are exogenous and commonly called instruments) in the

So I have yd which is partially an endogenous variable and partially
an exogenous variable y - y.1.  In the second stage of the estimation
it seems that this yd is replaced by (z - y.1), where z is the
instrumental variable (the result of fitting y against all of the
instruments).  So how does R know that yd should be replace by (z -
y.1) unless yd carries some information that it was originally created
as (y - y.1).

Maybe this question is best asked on the devel list?



> Regards, Adai
> Skipper Seabold wrote:
>> Hello all,
>> I hope this question is appropriate for this ML.
>> Basically, I am wondering if when you create a new variable, if the
>> variable holds some information about how it was created.
>> Let me explain, I have the following code to replicate an example in a
>> textbook (Greene's Econometric Analysis), using the systemfit package.
>> dta <-
>> read.table('http://pages.stern.nyu.edu/~wgreene/Text/Edition6/TableF5-1.txt',
>> header = TRUE)
>> attach(dta)
>> library(systemfit)
>> demand <- realcons + realinvs + realgovt
>> c.1 <- realcons[-204]
>> y.1 <- demand[-204]
>> yd <- demand[-1] - y.1
>> eqConsump <- realcons[-1] ~ demand[-1] + c.1
>> eqInvest <- realinvs[-1] ~ tbilrate[-1] + yd
>> system <- list( Consumption = eqConsump, Investment = eqInvest)
>> instruments <- ~ realgovt[-1] + tbilrate[-1] + c.1 + y.1
>> # 2SLS
>> greene2sls <- systemfit( system, "2SLS", inst = instruments,
>> methodResidCov = "noDfCor" )
>> When I do the 2SLS fit, it seems that even though I declared y.1 as an
>> instrument that the estimator "knows" that yd was created using y1, so
>> it (correctly) transforms yd to use the instrument in the final
>> estimation.
>> So I'm wondering if yd somehow carries knowledge of how it was created.
>> Thanks,
>> Skipper

More information about the R-help mailing list