[Rd] problem using model.frame()
Gavin Simpson
gavin.simpson at ucl.ac.uk
Tue Aug 16 19:44:23 CEST 2005
On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck wrote:
> On 8/16/05, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> > On Tue, 2005-08-16 at 11:25 -0400, Gabor Grothendieck wrote:
> > > It can handle data frames like this:
> > >
> > > model.frame(y1)
> > > or
> > > model.frame(~., y1)
> >
> > Thanks Gabor,
> >
> > Yes, I know that works, but I want the function coca.formula to accept a
> > formula like this y2 ~ y1, with both y1 and y2 being data frames. It is
>
> The expressions I gave work generally (i.e. lm, glm, ...), not just in
> model.matrix, so would it be ok if the user just does this?
>
> yourfunction(y2 ~., y1)
Thanks again Gabor for your comments,
I'd prefer the y1 ~ y2 as data frames - as this is the most natural way
of doing things. I'd like to have (y2 ~., y1) as well, and (y2 ~ spp1 +
spp2 + spp3, y1) also work - silently without any trouble.
> If it really is important to do it the way you describe, are the data
> frames necessarily numeric? If so you could preprocess your formula
> by placing as.matrix around all the variables representing data frames
> using something like this:
>
> https://www.stat.math.ethz.ch/pipermail/r-help/2004-December/061485.html
Yes, they are numeric matrices (as data frames). I've looked at this,
but I'd prefer to not have to do too much messing with the formula.
> Of course, if they are necessarily numeric maybe they can be matrices in
> the first place?
Because read.table etc. produce data.frames and this is the natural way
to work with data in R.
Following your suggestions, I altered my code to evaluate the rhs of the
formula and check if it was of class "data.frame". If it is then I stop
processing and return it as a data.frame as this point. If not, it
eventually gets passed on to model.frame() for it to deal with it.
So far - limited testing - it seems to do what I wanted all along. I'm
sure there's a gotcha in there somewhere but at least the code runs so I
can check for problems against my examples.
Right, back to writing documentation...
G
> > more intuitive, to my mind at least for this particular example and
> > analysis, to specify the formula with a data frame on the rhs.
> >
> > model.frame doesn't work with the formula "~ y1" if the object y1, in
> > the environment when model.frame evaluates the formula, is a data.frame.
> > It works if y1 is a matrix, however. I'd like to work around this
> > problem, say by creating an environment in which y1 is modified to be a
> > matrix, if possible. Can this be done?
> >
> > At the moment I have something working by grabbing the bits of the
> > formula and then using get() to grab the named object. Of course, this
> > won't work if someone wants to use R's formula interface with the
> > following formula y2 ~ var1 + var2 + var3, data = y1, or to use the
> > subset argument common to many formula implementations. I'd like to have
> > the function work in as general a manner as possible, so I'm fishing
> > around for potential solutions.
> >
> > All the best,
> >
> > Gav
> >
> > >
> > > On 8/16/05, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> > > > Hi I'm having a problem with model.frame, encapsulated in this example:
> > > >
> > > > y1 <- matrix(c(3,1,0,1,0,1,1,0,0,0,1,0,0,0,1,1,0,1,1,1),
> > > > nrow = 5, byrow = TRUE)
> > > > y1 <- as.data.frame(y1)
> > > > rownames(y1) <- paste("site", 1:5, sep = "")
> > > > colnames(y1) <- paste("spp", 1:4, sep = "")
> > > > y1
> > > >
> > > > model.frame(~ y1)
> > > > Error in model.frame(formula, rownames, variables, varnames, extras, extranames, :
> > > > invalid variable type
> > > >
> > > > temp <- as.matrix(y1)
> > > > model.frame(~ temp)
> > > > temp.spp1 temp.spp2 temp.spp3 temp.spp4
> > > > 1 3 1 0 1
> > > > 2 0 1 1 0
> > > > 3 0 0 1 0
> > > > 4 0 0 1 1
> > > > 5 0 1 1 1
> > > >
> > > > Ideally the above wouldn't have names like temp.var1, temp.var2, but one
> > > > could deal with that later.
> > > >
> > > > I have tracked down the source of the error message to line 1330 in
> > > > model.c - here I'm stumped as I don't know any C, but it looks as if the
> > > > code is looping over the variables in the formula and checking of they
> > > > are the right "type". So a matrix of variables gets through, but a
> > > > data.frame doesn't.
> > > >
> > > > It would be good if model.frame could cope with data.frames in formulae,
> > > > but seeing as I am incapable of providing a patch, is there a way around
> > > > this problem?
> > > >
> > > > Below is the head of the function I am currently using, including the
> > > > function for parsing the formula - borrowed and hacked from
> > > > ordiParseFormula() in package vegan.
> > > >
> > > > I can work out the class of the rhs of the forumla. Is there a way to
> > > > create a suitable environment for the data argument of parseFormula()
> > > > such that it contains the rhs dataframe coerced to a matrix, which then
> > > > should get through model.frame.default without error? How would I go
> > > > about manipulating/creating such an environment? Any other ideas?
> > > >
> > > > Thanks in advance
> > > >
> > > > Gav
> > > >
> > > > coca.formula <- function(formula, method = c("predictive", "symmetric"),
> > > > reg.method = c("simpls", "eigen"), weights = NULL,
> > > > n.axes = NULL, symmetric = FALSE, data)
> > > > {
> > > > parseFormula <- function (formula, data)
> > > > {
> > > > browser()
> > > > Terms <- terms(formula, "Condition", data = data)
> > > > flapart <- fla <- formula <- formula(Terms, width.cutoff = 500)
> > > > specdata <- formula[[2]]
> > > > X <- eval(specdata, data, parent.frame())
> > > > X <- as.matrix(X)
> > > > formula[[2]] <- NULL
> > > > if (formula[[2]] == "1" || formula[[2]] == "0")
> > > > Y <- NULL
> > > > else {
> > > > mf <- model.frame(formula, data, na.action = na.fail)
> > > > Y <- model.matrix(formula, mf)
> > > > if (any(colnames(Y) == "(Intercept)")) {
> > > > xint <- which(colnames(Y) == "(Intercept)")
> > > > Y <- Y[, -xint, drop = FALSE]
> > > > }
> > > > }
> > > > list(X = X, Y = Y)
> > > > }
> > > > if (missing(data))
> > > > data <- parent.frame()
> > > > #browser()
> > > > dat <- parseFormula(formula, data)
> > > >
> > > > --
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > > Gavin Simpson [T] +44 (0)20 7679 5522
> > > > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > > > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > > > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > > > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
> > > > London. WC1H 0AP.
> > > > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >
> > > > ______________________________________________
> > > > R-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > >
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Gavin Simpson [T] +44 (0)20 7679 5522
> > ENSIS Research Fellow [F] +44 (0)20 7679 7565
> > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
> > UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
> > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
> > London. WC1H 0AP.
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> >
> >
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/
London. WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-devel
mailing list