[R-pkg-devel] Determine subset from glm object

Sun Jul 8 20:08:28 CEST 2018

If there might be NA's in the response or predictors so na.exclude or
na.omit would remove
some rows as well, then using the row.names might be an easier way to match
up rows in
the original data with rows in gout$x.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sun, Jul 8, 2018 at 11:04 AM, Charles Geyer <charlie using stat.umn.edu> wrote:

> I think your second option sounds better because this is all happening
> inside one function I'm writing so users won't be able mess with the glm
> object. Many thanks.
>
> On Sun, Jul 8, 2018, 12:10 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
>
> > On 08/07/2018 11:48 AM, Charles Geyer wrote:
> > > I need to find out from an object returned by R function glm with
> > argument
> > > x = TRUE
> > > what the subsetting was.  It appears that if gout is that object, then
> > >
> > > as.integer(rownames(gout$x))
> > >
> > > is a subset vector equivalent to the one actually used.
> >
> > You don't want the "as.integer".  If the dataframe had rownames to start
> > with, the x component of the fit will have row labels consisting of
> > those labels, so as.integer may fail.  Even if it doesn't, the rownames
> > aren't necessarily sequential integers.   You can index the dataframe by
> > the character versions of the default numbers, so simply
> > rownames(gout$x) should always work.
> >
> > More generally, I'm not sure your question is well posed.  What do you
> > mean by "the subsetting"?  If you have something like
> >
> > df <- data.frame(letters, x = 1:26, y = rbinom(26, 1, 0.5))
> >
> > df1 <- subset(df, letters > "b" & letters < "y")
> >
> > gout <- glm(y ~ x, data = df1, subset = letters < "q", x = TRUE)
> >
> > the rownames(gout$x) are going to be numbers for rows of df, because df1
> > will get a subset of those as row labels.
> >
> >
> > > I do also have the call to glm (as a call object) so can determine the
> > > actual subset argument, but this seems to be not so useful because I
> > don't
> > > know the length of the original variables before subsetting.
> >
> > You should be able to evaluate the subset expression in the environment
> > of the formula, i.e.
> >
> > eval(gout$call$subset, envir = environment(gout$formula))
> >
> > This may give incorrect results if the variables used in subsetting
> > aren't in the dataframe and have changed since glm() was called.
> >
> >
> > > So now my questions.  Is this idea above (using rownames) OK even
> though
> > I
> > > cannot find where (if anywhere) it is documented?  Is there a better
> way?
> > > One more guaranteed to be correct in the future?
> > >
> >
> > I would trust evaluating the subset more than grabbing row labels from
> > gout$x, but I don't know for sure it is likely to be more robust.
> >
> > Duncan Murdoch
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]