[R] anomalies with the loess() function

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Oct 27 09:39:02 CEST 2010


On Tue, 2010-10-26 at 20:04 +0200, Federico Bonofiglio wrote:
> Hello Masters,
<snip />
> #I generate 2 fictious vectors
> 
> a<-sample(c(sample(1:1000,100),rep(NA,50)))
> 
> b<-sample(c(sample(1:1000,100),rep(NA,50)))
<snip />
> a<-na.omit(a)
> b<-na.omit(b)
> 
> #check out the evidence.....something's wrong with loess()???

No, something is wrong with your assumptions. lowess() and loess() do
not return anything like the same object. lowess() returns ordered $x
and $y components. The key here is "ordered".

loess() returns a complex list:

> str(fit)
List of 18
 $ n        : int 66
 $ fitted   : num [1:66] 390 447 617 494 283 ...
 $ residuals: Named num [1:66] 270 188 161 -288 104 ...
  ..- attr(*, "names")= chr [1:66] "1" "2" "4" "5" ...
 $ enp      : num 7.87
.... etc.

The fitted values etc are in the order of the data you supplied in a and
b. Hence when you plot these values you will get a cats cradle of lines
because you are asking R to join points in sequence that are *not*
ordered across the plot.

> par(mfrow=c(1,2))
> plot(a,b)
> lines(lowess(a,b),col="red")#if NAs are excluded lowess() runs regularly
> plot(a,b)
> lines(loess(b~a),col="red")#.....but loess() keeps messing all over...!!???

Not sure what you hoped that to do. You are assuming far too much here.
You need to extract the information you want from a loess() object. That
lowess() works here is because it returns components $x and $y and the
underlying plot infrastructure in R will look for this in object it is
supplied. loess() doesn't supply these $x and $y so what you are seeing
is just gibberish because you supplied garbage.

Two alternatives:

a<-sample(c(sample(1:1000,100),rep(NA,50)))
b<-sample(c(sample(1:1000,100),rep(NA,50)))

mod <- loess(b~a, na.action = na.exclude)
fit <- fitted(mod)
ord <- order(a)
plot(a, b)
lines(a[ord], fit[ord], col = "red")

The above respects your fitted values and the locations of missing
values. The next solution will plot the fitted smooth for a set of
regularly spaced values over the range of a

pdata <- data.frame(a = seq(min(a, na.rm = TRUE), 
                            max(a, na.rm = TRUE), length = 100))
pred <- predict(mod, pdata)
plot(a, b)
lines(pdata$a, pred, col = "red")

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list