[R-pkg-devel] Determine subset from glm object
Heather Turner
ht @ending from he@therturner@net
Mon Jul 9 17:20:15 CEST 2018
On Sun, Jul 8, 2018, at 8:25 PM, Charles Geyer wrote:
> I spoke too soon. The problem isn't that I don't know how to get the
> subset argument. I am just calling glm (via eval) with (mostly) the
> same arguments as the call to my function, so subset is (if not
> missing) an argument to my function too. So I can just use it.
>
> The problem is that I then want to call glm again fitting a subset of
> the original subset (if there was one). And when I do that glm will
> refer to the original data wherever it is, and I don't have that.
>
> if this isn't clear, here is the code as it stands now
> https://github.com/cjgeyer/glmdr/blob/master/package/glmdr/R/glmdr.R.
>
> The issue is with the lines (very near the end)
>
> subset.lcm <- as.integer(rownames(modmat))
> subset.lcm <- subset.lcm[linearity]
> # call glm again
> call.glm$subset <- subset.lcm
> gout.lcm <- eval(call.glm, parent.frame())
>
> I can see from what Duncan said that I really don't want the
> as.integer around rownames. But it is not clear what would be better.
>
> I just had another thought that I could get the original data with
> another call to glm with subset removed from the call and method =
> "model.frame" added. And I think (maybe, have to try it) that it
> would have NA's removed or whatever na.action says to do.
> But that seems redundant.
>
>
As you are calling stats::glm, you can use `model.frame` to get the data used to fit the model after applying subset and na.action. So then you can do:
call.glm$subset <- linearity
call.glm$data <- model.frame(gout)
I think this is what you are after?
Heather
>
> On Sun, Jul 8, 2018, 1:04 PM Charles Geyer <charlie using stat.umn.edu> wrote:
> >
> > I think your second option sounds better because this is all happening inside one function I'm writing so users won't be able mess with the glm object. Many thanks.
> >
> > On Sun, Jul 8, 2018, 12:10 PM Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
> >>
> >> On 08/07/2018 11:48 AM, Charles Geyer wrote:
> >> > I need to find out from an object returned by R function glm with argument
> >> > x = TRUE
> >> > what the subsetting was. It appears that if gout is that object, then
> >> >
> >> > as.integer(rownames(gout$x))
> >> >
> >> > is a subset vector equivalent to the one actually used.
> >>
> >> You don't want the "as.integer". If the dataframe had rownames to start
> >> with, the x component of the fit will have row labels consisting of
> >> those labels, so as.integer may fail. Even if it doesn't, the rownames
> >> aren't necessarily sequential integers. You can index the dataframe by
> >> the character versions of the default numbers, so simply
> >> rownames(gout$x) should always work.
> >>
> >> More generally, I'm not sure your question is well posed. What do you
> >> mean by "the subsetting"? If you have something like
> >>
> >> df <- data.frame(letters, x = 1:26, y = rbinom(26, 1, 0.5))
> >>
> >> df1 <- subset(df, letters > "b" & letters < "y")
> >>
> >> gout <- glm(y ~ x, data = df1, subset = letters < "q", x = TRUE)
> >>
> >> the rownames(gout$x) are going to be numbers for rows of df, because df1
> >> will get a subset of those as row labels.
> >>
> >>
> >> > I do also have the call to glm (as a call object) so can determine the
> >> > actual subset argument, but this seems to be not so useful because I don't
> >> > know the length of the original variables before subsetting.
> >>
> >> You should be able to evaluate the subset expression in the environment
> >> of the formula, i.e.
> >>
> >> eval(gout$call$subset, envir = environment(gout$formula))
> >>
> >> This may give incorrect results if the variables used in subsetting
> >> aren't in the dataframe and have changed since glm() was called.
> >>
> >>
> >> > So now my questions. Is this idea above (using rownames) OK even though I
> >> > cannot find where (if anywhere) it is documented? Is there a better way?
> >> > One more guaranteed to be correct in the future?
> >> >
> >>
> >> I would trust evaluating the subset more than grabbing row labels from
> >> gout$x, but I don't know for sure it is likely to be more robust.
> >>
> >> Duncan Murdoch
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
More information about the R-package-devel
mailing list