[Rd] problem using model.frame()

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Aug 16 19:44:23 CEST 2005


On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck wrote:
> On 8/16/05, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> > > It can handle data frames like this:
> > >
> > >       model.frame(y1)
> > > or
> > >       model.frame(~., y1)
> > 
> > Thanks Gabor,
> > 
> > Yes, I know that works, but I want the function coca.formula to accept a
> > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
> 
> The expressions I gave work generally (i.e. lm, glm, ...), not just in 
> model.matrix, so would it be ok if the user just does this?
> 
> yourfunction(y2 ~., y1)

Thanks again Gabor for your comments,

I'd prefer the y1 ~ y2 as data frames - as this is the most natural way
of doing things. I'd like to have (y2 ~., y1) as well, and (y2 ~ spp1 +
spp2 + spp3, y1) also work - silently without any trouble.

> If it really is important to do it the way you describe, are the data 
> frames necessarily numeric? If so you could preprocess your formula 
> by placing as.matrix around all the variables representing data frames 
> using something like this:
> 
> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html

Yes, they are numeric matrices (as data frames). I've looked at this,
but I'd prefer to not have to do too much messing with the formula.

> Of course, if they are necessarily numeric maybe they can be matrices in
> the first place?

Because read.table etc. produce data.frames and this is the natural way
to work with data in R.

Following your suggestions, I altered my code to evaluate the rhs of the
formula and check if it was of class "data.frame". If it is then I stop
processing and return it as a data.frame as this point. If not, it
eventually gets passed on to model.frame() for it to deal with it.

So far - limited testing - it seems to do what I wanted all along. I'm
sure there's a gotcha in there somewhere but at least the code runs so I
can check for problems against my examples.

Right, back to writing documentation...

G

> > more intuitive, to my mind at least for this particular example and
> > analysis, to specify the formula with a data frame on the rhs.
> > 
> > model.frame doesn't work with the formula "~ y1" if the object y1, in
> > the environment when model.frame evaluates the formula, is a data.frame.
> > It works if y1 is a matrix, however. I'd like to work around this
> > problem, say by creating an environment in which y1 is modified to be a
> > matrix, if possible. Can this be done?
> > 
> > At the moment I have something working by grabbing the bits of the
> > formula and then using get() to grab the named object. Of course, this
> > won't work if someone wants to use R's formula interface with the
> > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> > subset argument common to many formula implementations. I'd like to have
> > the function work in as general a manner as possible, so I'm fishing
> > around for potential solutions.
> > 
> > All the best,
> > 
> > Gav
> > 
> > >
> > > On 8/16/05, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> > > > Hi I'm having a problem with model.frame, encapsulated in this example:
> > > >
> > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > > >             nrow = 5, byrow = TRUE)
> > > > y1 <- as.data.frame(y1)
> > > > rownames(y1) <- paste("site", 1:5, sep = "")
> > > > colnames(y1) <- paste("spp", 1:4, sep = "")
> > > > y1
> > > >
> > > > model.frame(~ y1)
> > > > Error in model.frame(formula, rownames, variables, varnames, extras, extranames,  :
> > > >        invalid variable type
> > > >
> > > > temp <- as.matrix(y1)
> > > > model.frame(~ temp)
> > > >  temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > > > 1         3         1         0         1
> > > > 2         0         1         1         0
> > > > 3         0         0         1         0
> > > > 4         0         0         1         1
> > > > 5         0         1         1         1
> > > >
> > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > > > could deal with that later.
> > > >
> > > > I have tracked down the source of the error message to line 1330 in
> > > > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > > > code is looping over the variables in the formula and checking of they
> > > > are the right "type". So a matrix of variables gets through, but a
> > > > data.frame doesn't.
> > > >
> > > > It would be good if model.frame could cope with data.frames in formulae,
> > > > but seeing as I am incapable of providing a patch, is there a way around
> > > > this problem?
> > > >
> > > > Below is the head of the function I am currently using, including the
> > > > function for parsing the formula - borrowed and hacked from
> > > > ordiParseFormula() in package vegan.
> > > >
> > > > I can work out the class of the rhs of the forumla. Is there a way to
> > > > create a suitable environment for the data argument of parseFormula()
> > > > such that it contains the rhs dataframe coerced to a matrix, which then
> > > > should get through model.frame.default without error? How would I go
> > > > about manipulating/creating such an environment? Any other ideas?
> > > >
> > > > Thanks in advance
> > > >
> > > > Gav
> > > >
> > > > coca.formula <- function(formula, method = c("predictive", "symmetric"),
> > > >                         reg.method = c("simpls", "eigen"), weights = NULL,
> > > >                         n.axes = NULL, symmetric = FALSE, data)
> > > >  {
> > > >    parseFormula <- function (formula, data)
> > > >      {
> > > >        browser()
> > > >        Terms <- terms(formula, "Condition", data = data)
> > > >        flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
> > > >        specdata <- formula[[2]]
> > > >        X <- eval(specdata, data, parent.frame())
> > > >        X <- as.matrix(X)
> > > >        formula[[2]] <- NULL
> > > >        if (formula[[2]] == "1" || formula[[2]] == "0")
> > > >          Y <- NULL
> > > >        else {
> > > >          mf <- model.frame(formula, data, na.action = na.fail)
> > > >          Y <- model.matrix(formula, mf)
> > > >          if (any(colnames(Y) == "(Intercept)")) {
> > > >            xint <- which(colnames(Y) == "(Intercept)")
> > > >            Y <- Y[, -xint, drop = FALSE]
> > > >          }
> > > >        }
> > > >        list(X = X, Y = Y)
> > > >      }
> > > >    if (missing(data))
> > > >      data <- parent.frame()
> > > >    #browser()
> > > >    dat <- parseFormula(formula, data)
> > > >
> > > > --
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > > Gavin Simpson                     [T] +44 (0)20 7679 5522
> > > > ENSIS Research Fellow             [F] +44 (0)20 7679 7565
> > > > ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
> > > > UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > > > 26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
> > > > London.  WC1H 0AP.
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Gavin Simpson                     [T] +44 (0)20 7679 5522
> > ENSIS Research Fellow             [F] +44 (0)20 7679 7565
> > ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
> > UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > 26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
> > London.  WC1H 0AP.
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > 
> > 
> >
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-devel mailing list