[R] Re: R-help digest, Vol 1 #89 - 53 msgs

Wed Feb 26 18:29:03 CET 2003

I couldn't resist tossing in my two-cents' worth on this, because R has
some language features that allow you to use efficient optimization
routines in a straightforward, elegant way.  I'm particularly enthusiastic
about this because I have suffered through other languages, which made this
approach either painful or impossible, depending on the problem.  Thanks to
the R development team for their great work!

Jose has offered a "brute force" approach, evaluating distance on a fine
grid of points.  This certainly is very robust.  However, using an
optimization routine can lead to fewer evaluations than evaluating on a
fine grid of points, and can give resolution that is finer than the grid.
You may wish to combine the two, and start the optimization from a grid of
starting points.  This grid need not be very fine, though.  And in the
one-dimensional case, the "optimize" function appears to be very robust
(perhaps it already incorporates this strategy).

For any search to work, You need to have the curve be "parameterized" in
some sense, allowing you to identify an infinite number of points lying on
the curve as you vary some underlying parameter.  For instance, perhaps
you've fitted a nonparametric regression model allowing you to make
predictions for any point x.  Or you have some means to calculate a y
value, given any x (such that x is on the curve).

The overall strategy I'm enthusiastic about is to create a loss function
that's specific to the point in question, and then use an optimization
routine to, well, optimize.  And because you'll want to do this again and
again, for convenience you can write a function that makes point-specific
loss functions (this is the part that is so straightforward to R, relative
to many other languages).

Here's an example:

example.data <- data.frame(x = seq(-1, 1, length = 200))
set.seed(123)
example.data$y <- example.data$x^2 + rnorm(200, sd = 0.2)
library(modreg)
example.spline <- smooth.spline(example.data)
plot(example.data)
lines(example.spline)
X1 <- c(0.25, 0.75)
points(X1[1], X1[2], pch = "x")

# What's the point on the curve defined by example.spline
# that's closest to X1?

# Euclidean distance
MyDistance <- function(x1, x2)
  {
    sqrt(sum((x2 - x1)^2))
  }

# For convenient re-use, a function that makes point-specific loss
functions.
# (For some model objects, the predict method requires something
# like "predict(CurveObject, newdata = data.frame(x = s))".)
MakeLoss <- function(x, CurveObject)
  {
    function(s) MyDistance(x, c(s, predict(CurveObject, s)$y))
  }

# Now make the loss function for X1...
MyLoss <- MakeLoss(X1, example.spline)

# ...and optimize.  For a higher-dimensional problem, use "optim"
# rather than "optimize".
closest.x <- optimize(MyLoss, interval = c(-1, 1))$minimum
lines(c(X1[1], closest.x),
      c(X1[2], predict(example.spline, closest.x)$y))

Good luck!

-Jim Garrett
Becton Dickinson Diagnostic Systems
Baltimore, Maryland, USA

**********************************************************************
This message is intended only for the designated recipient(s).   ... [[dropped]]