[Rd] delete.response leaves response in attribute dataClasses

William Dunlap wdunlap at tibco.com
Fri Jan 6 21:23:38 CET 2012


> -----Original Message-----
> From: Paul Johnson [mailto:pauljohn32 at gmail.com]
> Sent: Friday, January 06, 2012 11:17 AM
> To: William Dunlap
> Cc: R Devel List
> Subject: Re: [Rd] delete.response leaves response in attribute dataClasses
> 
> Thanks, Bill
> 
> Counter-arguments at the end
> 
> On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap <wdunlap at tibco.com> wrote:
> > My feeling that everyone would index dataClasses by name was
> > wrong.  I looked through the packages that used dataClasses
> > and saw code that would break if the first (response) entry
> > were omitted.  (I didn't check to see if passing the output
> > of delete.response to these functions would be appropriate.)
> > E.g.,
> > file: AICcmodavg/R/predictSE.mer.r
> >  ##matrix with info on factors
> >  fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]
> >
> >  ##continue if factors
> >  if(any(fact.frame == "factor")) {
> >    id.factors <- which(fact.frame == "factor")
> >    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors
> >
> > Some packages create a dataClass attribute for a model.frame
> > (not its terms attribute) that does not have any names:
> > file: caper/R/macrocaic.R
> >   attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
> > .checkMFClasses() does not throw an error for that, but it
> > doesn't do any real checking either.
> >
> > Most users of dataClasses do pass it to .checkMFClasses() to
> > compare it with newdata and that doesn't care if you have extra
> > entries in dataClasses.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> 
> I can't understand what your point is.  I agree we can work around the
> problem, but why should we have to?

I guess my point was that it would make sense for delete.response
to drop the response element from dataClasses, as it has no use.
It was almost certainly an oversight that it wasn't dropped, as most
terms objects don't have the dataClasses attribute.

Properly written code, which only subscripted dataClasses by name
(not by number) would not be affected by the change but improperly
written code (e.g., AICcmodavg's predictSE, which assumes the response
is in position 1) would be adversely affected in the unlikely case that
someone passed it the output of delete.response.

I don't know how much you want to cater to "errors" by package writers.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 



> 
> If you confine yourself to the output of "delete.response" applied to
> a terms object from a regression, can you point to any package or
> usage that depends on leaving the response variable in the dataClasses
> attribute?  I can't find one.  In R base, these are all the references
> to delete.response:
> 
> stats/R/models.R:delete.response <- function (termobj)
> stats/R/lm.R:        Terms <- delete.response(tt)
> stats/R/lm.R:        Terms <- delete.response(tt)
> stats/R/ppr.R:        Terms <- delete.response(object$terms)
> stats/R/loess.R:
> as.matrix(model.frame(delete.response(terms(object)), newdata,
> stats/R/dummy.coef.R:    Terms <- delete.response(Terms)
> 
> I've looked it over carefully and predict.lm (in lm.R) would not be
> affected by the change I propose. I can't find any usage in loess.R of
> the dataClasses attribute.
> 
> Furthermore, I can't see how a person would use the dataClasses
> attribute at all, after the other markers of the response are
> eliminated. How is a method to find which variable is the response,
> after response=0?
> 
> I'm not disagreeing with you that I can workaround the peculiarity
> that the response is left in the dataClasses attribute of the output
> object from delete.response.  I'm just saying it is a complication
> that programmers should not have to put up with, because I think
> delete.response should delete the response from all attributes of a
> terms object.
> 
> pj
> 
> 
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas



More information about the R-devel mailing list