[R] model syntax processed --- probably common

Mon Aug 19 22:28:11 CEST 2013

On Aug 19, 2013, at 16:05, ivo welch <ivo.welch at gmail.com> wrote:

> thank you.  but uggh...sorry for my html post.  and sorry again for
> having been obscure in my attempt to be brief.  here is a working
> program.
> 
> fama.macbeth <- function( formula, din ) {

I think most users would expect 'din' to be 'data' here

>  fnames <- terms( formula )
>  dnames <- names( din )
>  stopifnot( all(dimnames(attr(fnames, "factors"))[[1]] %in%  dnames) )
> 
>  monthly.regressions <- by( din, as.factor(din$month), function(dd)
> coef(lm(model.frame( formula, data=dd ))))
>  as.m <- do.call("rbind", monthly.regressions)
>  colMeans(as.m)
> }
> 
> ## a test data set
> d <- data.frame( month=rep(1:5,10), y= rnorm(50), x= rnorm(50), z=rnorm(50) )
> 
> ## this works beautifully, exactly how I want it.  the names are
> there, the formula works.
> print(fama.macbeth( y ~ x , din=d ))
> 
> ## now I want something like the following statement to work, too
> for (nm in c("x")) print(fama.macbeth( y ~ nm, din=d ))
>   or
> for (nm in c("x")) print(fama.macbeth( y ~ d[[nm]], din=d ))
>  or whatever.
> 
> the output in both cases should be the same, preferably even knowing
> that the name of the variable is really "x" and not nm.  is there a
> standard common way to do this?

I don't think so -- most of the core modelling functions expect all the covariates to be passed in through a single data frame. 

Perhaps you're looking for programmatic construction of the formula using paste and as.formula:

fama.macbeth(as.formula(paste("y ~", nm)))

eval(parse(text = nm)) also works but you don't get nice names on the resulting object so I'd suggest as.formula()

If there's something better, I'd love to know -- I always have to use tricks here when doing similar looped regressions. 

Michael

> 
> regards,
> 
> /iaw
> 
> ----
> Ivo Welch (ivo.welch at gmail.com)
> http://www.ivo-welch.info/
> J. Fred Weston Professor of Finance
> Anderson School at UCLA, C519
> Director, UCLA Anderson Fink Center for Finance and Investments
> Free Finance Textbook, http://book.ivo-welch.info/
> Editor, Critical Finance Review, http://www.critical-finance-review.org/
> 
> 
> 
> On Mon, Aug 19, 2013 at 12:48 PM, David Winsemius
> <dwinsemius at comcast.net> wrote:
>> 
>> On Aug 19, 2013, at 9:45 AM, ivo welch wrote:
>> 
>>> dear R experts---I was programming a fama-macbeth panel regression (a
>>> fama-macbeth regression is essentially T cross-sectional regressions, with
>>> statistics then obtained from the time-series of coefficients), partly
>>> because I wanted faster speed than plm, partly because I wanted some
>>> additional features.
>>> 
>>> my function starts as
>>> 
>>> fama.macbeth <- function( formula, din ) {
>>>  names <- terms( formula )
>>> ## omitted : I want an immediate check that the formula refers to
>>> existing variables in the data frame with English error messages
>>> 
>> 
>> Look the structure of a terms result from a formula argument with str():
>> 
>> fama.macbeth <- function( formula, din ) {
>>   fnames <- terms( formula ) ; str(fnames)
>> }
>> 
>>> fama.macbeth( x ~ y, data.frame(x=rnorm(10), y=rnorm(10) ) )
>> Classes 'terms', 'formula' length 3 x ~ y
>>  ..- attr(*, "variables")= language list(x, y)
>>  ..- attr(*, "factors")= int [1:2, 1] 0 1
>>  .. ..- attr(*, "dimnames")=List of 2
>>  .. .. ..$ : chr [1:2] "x" "y"
>>  .. .. ..$ : chr "y"
>>  ..- attr(*, "term.labels")= chr "y"
>>  ..- attr(*, "order")= int 1
>>  ..- attr(*, "intercept")= int 1
>>  ..- attr(*, "response")= int 1
>>  ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
>> 
>> Then extract the dimnames from the "factors" attribute to compare to the names in hte data-object:
>> 
>>> fama.macbeth <- function( formula, din ) {
>>  fnames <- terms( formula ) ;  dnames <- names( din)
>>  dimnames(attr(fnames, "factors"))[[1]] %in%  dnames
>> }
>> #[1] TRUE TRUE
>> 
>> 
>> I couldn't tell if this was the main thrust of you question. It seems to meander a bit.
>> 
>> --
>> David.
>> 
>>> monthly.regressions <- by( din, as.factor(din$month), function(dd)
>>> coef(lm(model.frame( formula, data=dd )))
>>>  as.m <- do.call("rbind", monthly.regressions)
>>>  colMeans(as.m)  ## or something like this.
>>> }
>>> say my data frame mydata has columns named month, r, laggedx and ... .  I
>>> can call this function
>>> 
>>>  fama.macbeth( r ~ laggedx, din=mydata )
>>> 
>>> but it fails
>> 
>> What fails?
>> 
>> 
>>> if I want to compute my x variables.  for example,
>>> 
>>>  myx <- d[,"laggedx"]
>>>  fama.macbeth( r ~ myx)
>>> 
>>> I also wish that the computed myx still remembered that it was really
>>> laggedx.  it's almost as if I should not create a vector myx but a data
>>> frame myx to avoid losing the column name.
>> 
>> I wouldn't say "almost"... rather that is exactly what you should do. R regression methods almost always work better when formulas are interpreted in the environment of the data argument.
>> 
>>> I wonder why such vectors don't
>>> keep a name attribute of some sort.
>>> 
>>> there is probably an "R way" of doing this.  is there?
>>> 
>>> /iaw
>>> 
>>> ----
>>> Ivo Welch (ivo.welch at gmail.com)
>>> 
>>>      [[alternative HTML version deleted]]
>> 
>> Still posting HTML?
>> 
>>> 
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> And do explain what the goal is.
>> 
>> --
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.