[R] can predict ignore rows with insufficient info
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Sep 17 08:24:22 CEST 2003
On Tue, 16 Sep 2003, Peter Whiting wrote:
> On Tue, Sep 16, 2003 at 04:31:29PM -0400, Thomas W Blackwell wrote:
> > Corrected and re-named version of function:
> >
> > unsupported <- function(i,y,d) {
> > result <- rep(F, dim(d)[1]) # default return value when
> > if (is.factor(d[[i]])) # d[[i]] is not a factor.
> > result <- !(d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ]))
> > result }
> >
> > tmp.1 <- lapply(seq(along=const), unsupported, "days", const)
> > tmp.2 <- matrix(unlist(tmp.1[ names(const) != "days" ]), nrow=dim(const)[1])
> > tmp.3 <- as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))
> >
> > x <- predict(g, const[ is.na(const$days) & !tmp.3, ])
>
> Here is an approach I came up with that appears to work:
(One I sent privately to Peter.)
> predict2 <- function(g,data,...)
> {
> for(nm in names(g$xlevels)) {
> cat(paste(nm,"\n"))
> data[[nm]]<- factor(data[[nm]],levels=g$xlevels[[nm]])
> }
> predict(g,data,...)
> }
>
> It bases its operation on refactoring each predictor using the
> factor's "levels=" argument. Any element having a level not in
> g$xlevels ends up as an NA, which predict correctly handles.
>
> I'm not sure why predict doesn't do something like this by
> default, but I am just a newbee.
Because it is thought more common for additional levels to be a mistake
that the user would want to be alerted to. Note also that here you are
talking about the "lm" method of predict(), and by no means all methods do
handle NAs in the model matrix (and for those that do it is rather
recent).
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list