[Rd] loess prediction algorithm
apjaworski at mmm.com
apjaworski at mmm.com
Thu Jul 26 00:56:54 CEST 2007
Hello,
I need help with the details of loess prediction algorithm. I would like
to get it implemented as a part of a measurement system programmed in
LabView. My job is provide a detailed description of the algorithm. This
is a simple one-dimensional problem - smoothing an (x, y) data set.
I found quite a detailed description of the fitting procedure in the "white
book". It is also described in great detail at the NIST site in the
Engineering Statistics Handbook. It provides an example of Loess
computations. I managed to reproduce their example exactly in R. At each
data point I compute a weighted local linear fit using the number of points
based of span. Then I predict the values from these local fits. This
matches R "loess" predictions exactly.
The problem starts when I try to predict at x values not in the data set.
The "white book" does not talk about predictions at all. In the NIST
handbook in the "Final note on Loess Computations" they mention this type
of predictions but just say that the same steps are used for predictions as
for fitting.
When I try to use "the same steps" I get predictions that are quite
different that the predictions obtained by fitting R loess model to a data
set and running predict(<model object>, newdata=<grid of x values>). They
match quite well at the lowest and highest ends of the x grid but in the
middle are different and much less smooth. I can provide details but
basically what I do to create the predictions at x0 is this:
1. I append c(x0, NA) to the data frame of (x, y) data.
2. I calculate abs(xi-x0), i.e., absolute deviations of the x values in
the data set and a given x0 value.
3. I sort the data set according to these deviations. This way the first
row has the (x0, NA) value.
4. I drop the first row.
5. I divide all the deviations by the m-th one, where m is the number of
points used in local fitting - m = floor(n*span) where n is the number of
points.
6. I calculate the "tricube" weights and assign 0's to the negative ones.
This eliminates all the points except the m points of interest.
7. I fit a linear weighted regression using lm.
8. I predict y(x0) from this linear model.
This is basically the same procedure I use to predict at the x values from
the data set, except for point 4.
I got the R sources for loess but it looks to me like most of the work is
done in a bunch of Fortran modules. They are very difficult to read and
understand, especially since they handle multiple x values. A couple of
things that worry me are parameters in loess.control such as surface and
cell. They seem to have something to do with predictions but I do not
account for them in my simple procedure above.
Could anyone shed a light on this problem? Any comment will be
appreciated.
I apologize in advance if this should have been posted in r-help. I
figured that I have a better chance asking people who read the r-devel
group, since they are likely to know more details about inner workings of
R.
Thanks in advance,
Andy
__________________________________
Andy Jaworski
518-1-01
Process Laboratory
3M Corporate Research Laboratory
-----
E-mail: apjaworski at mmm.com
Tel: (651) 733-6092
Fax: (651) 736-3122
More information about the R-devel
mailing list