[R] predict.loess and NA/NaN values

Philipp Pagel p.pagel at wzw.tum.de
Mon Aug 30 21:55:35 CEST 2010


On Mon, Aug 30, 2010 at 01:50:03PM +0100, Prof Brian Ripley wrote:
> The underlying problem is your expectations.
> 
> R (unlike S) was set up many years ago to use na.omit as the
> default, and when fitting both lm() and loess() silently omit cases
> with missing values.  So why should prediction from 'newdata' be
> different unless documented to be so (which it is nowadays for
> predict.lm, even though you are adding to the evidence that was a
> mistake)?

Thanks for your insights into the undelying philisophy. I agree that
na.omit is a sensible default for model fitting. But I am not so sure
that quietly omitting unpredictable values is such a good idea -
especially if predict methods for different types of model implement
inconsistent approaches. I see no disadvantage in returning NA where
no prediction/computation is possible -- the value is 'Not Available',
after all. (And the length of the result vector would match
nrow(newdata) which would be handy for most practical purposes)

> loess() is somewhat different from lm() in that it does not in
> general allow extrapolation, and the prediction for Inf and NaN is
> simply undefined.

Of course this is correct but I still think that predict.loess not
only acts in a way that will most likely be surprising to most users
but also inconsistent with itself (Inf vs. NA/NaN). If extrapolation
is the problem Inf should not yield anything but it does (and the same
applies to values outside of the original x-range):

x <- rnorm(15)
y <- rnorm(15)
model.loess <- loess(y~x)
predict(model.loess, data.frame(x=c(0.5, Inf)))
# [1] -0.02508801          NA
predict(model.loess, data.frame(x=min(x)-10))
# [1] NA


Actually, while tracking down my problem I did consider that
extrapolation could be the problem and, according to the last example
in ?loess, tried to set control = loess.control(surface = "direct").
To my surprise, now even Inf fails - although I am much happier with
getting an error message than with silent omission.

Anyway, writing a little wrapper that puts NAs back into results, is
not a big deal and in that respect my problem is solved. 

> Nevertheless, take a look at the version in R-devel (pre-2.12.0)
> which give you more options.

Thanks for that information - I will definitely have a look at that.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/



More information about the R-help mailing list