[R] model syntax processed --- probably common
R. Michael Weylandt <michael.weylandt@gmail.com>
michael.weylandt at gmail.com
Mon Aug 19 22:28:11 CEST 2013
On Aug 19, 2013, at 16:05, ivo welch <ivo.welch at gmail.com> wrote:
> thank you. but uggh...sorry for my html post. and sorry again for
> having been obscure in my attempt to be brief. here is a working
> program.
>
> fama.macbeth <- function( formula, din ) {
I think most users would expect 'din' to be 'data' here
> fnames <- terms( formula )
> dnames <- names( din )
> stopifnot( all(dimnames(attr(fnames, "factors"))[[1]] %in% dnames) )
>
> monthly.regressions <- by( din, as.factor(din$month), function(dd)
> coef(lm(model.frame( formula, data=dd ))))
> as.m <- do.call("rbind", monthly.regressions)
> colMeans(as.m)
> }
>
> ## a test data set
> d <- data.frame( month=rep(1:5,10), y= rnorm(50), x= rnorm(50), z=rnorm(50) )
>
> ## this works beautifully, exactly how I want it. the names are
> there, the formula works.
> print(fama.macbeth( y ~ x , din=d ))
>
> ## now I want something like the following statement to work, too
> for (nm in c("x")) print(fama.macbeth( y ~ nm, din=d ))
> or
> for (nm in c("x")) print(fama.macbeth( y ~ d[[nm]], din=d ))
> or whatever.
>
> the output in both cases should be the same, preferably even knowing
> that the name of the variable is really "x" and not nm. is there a
> standard common way to do this?
I don't think so -- most of the core modelling functions expect all the covariates to be passed in through a single data frame.
Perhaps you're looking for programmatic construction of the formula using paste and as.formula:
fama.macbeth(as.formula(paste("y ~", nm)))
eval(parse(text = nm)) also works but you don't get nice names on the resulting object so I'd suggest as.formula()
If there's something better, I'd love to know -- I always have to use tricks here when doing similar looped regressions.
Michael
>
> regards,
>
> /iaw
>
> ----
> Ivo Welch (ivo.welch at gmail.com)
> http://www.ivo-welch.info/
> J. Fred Weston Professor of Finance
> Anderson School at UCLA, C519
> Director, UCLA Anderson Fink Center for Finance and Investments
> Free Finance Textbook, http://book.ivo-welch.info/
> Editor, Critical Finance Review, http://www.critical-finance-review.org/
>
>
>
> On Mon, Aug 19, 2013 at 12:48 PM, David Winsemius
> <dwinsemius at comcast.net> wrote:
>>
>> On Aug 19, 2013, at 9:45 AM, ivo welch wrote:
>>
>>> dear R experts---I was programming a fama-macbeth panel regression (a
>>> fama-macbeth regression is essentially T cross-sectional regressions, with
>>> statistics then obtained from the time-series of coefficients), partly
>>> because I wanted faster speed than plm, partly because I wanted some
>>> additional features.
>>>
>>> my function starts as
>>>
>>> fama.macbeth <- function( formula, din ) {
>>> names <- terms( formula )
>>> ## omitted : I want an immediate check that the formula refers to
>>> existing variables in the data frame with English error messages
>>>
>>
>> Look the structure of a terms result from a formula argument with str():
>>
>> fama.macbeth <- function( formula, din ) {
>> fnames <- terms( formula ) ; str(fnames)
>> }
>>
>>> fama.macbeth( x ~ y, data.frame(x=rnorm(10), y=rnorm(10) ) )
>> Classes 'terms', 'formula' length 3 x ~ y
>> ..- attr(*, "variables")= language list(x, y)
>> ..- attr(*, "factors")= int [1:2, 1] 0 1
>> .. ..- attr(*, "dimnames")=List of 2
>> .. .. ..$ : chr [1:2] "x" "y"
>> .. .. ..$ : chr "y"
>> ..- attr(*, "term.labels")= chr "y"
>> ..- attr(*, "order")= int 1
>> ..- attr(*, "intercept")= int 1
>> ..- attr(*, "response")= int 1
>> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
>>
>> Then extract the dimnames from the "factors" attribute to compare to the names in hte data-object:
>>
>>> fama.macbeth <- function( formula, din ) {
>> fnames <- terms( formula ) ; dnames <- names( din)
>> dimnames(attr(fnames, "factors"))[[1]] %in% dnames
>> }
>> #[1] TRUE TRUE
>>
>>
>> I couldn't tell if this was the main thrust of you question. It seems to meander a bit.
>>
>> --
>> David.
>>
>>> monthly.regressions <- by( din, as.factor(din$month), function(dd)
>>> coef(lm(model.frame( formula, data=dd )))
>>> as.m <- do.call("rbind", monthly.regressions)
>>> colMeans(as.m) ## or something like this.
>>> }
>>> say my data frame mydata has columns named month, r, laggedx and ... . I
>>> can call this function
>>>
>>> fama.macbeth( r ~ laggedx, din=mydata )
>>>
>>> but it fails
>>
>> What fails?
>>
>>
>>> if I want to compute my x variables. for example,
>>>
>>> myx <- d[,"laggedx"]
>>> fama.macbeth( r ~ myx)
>>>
>>> I also wish that the computed myx still remembered that it was really
>>> laggedx. it's almost as if I should not create a vector myx but a data
>>> frame myx to avoid losing the column name.
>>
>> I wouldn't say "almost"... rather that is exactly what you should do. R regression methods almost always work better when formulas are interpreted in the environment of the data argument.
>>
>>> I wonder why such vectors don't
>>> keep a name attribute of some sort.
>>>
>>> there is probably an "R way" of doing this. is there?
>>>
>>> /iaw
>>>
>>> ----
>>> Ivo Welch (ivo.welch at gmail.com)
>>>
>>> [[alternative HTML version deleted]]
>>
>> Still posting HTML?
>>
>>>
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> And do explain what the goal is.
>>
>> --
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list