[R] Problem with Princurve

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon May 23 20:28:16 CEST 2011


On Mon, 2011-05-23 at 09:58 -0400, Ravi Varadhan wrote:
> Why does this not find a better solution?
> 
> > x <- seq(0,2*pi, length=1000)
> > x <- cbind(x/(2*pi), sin(x))
> 
> > fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100,
> + start = cbind(sort(x[,1]), rep(1, nrow(x))))
> Starting curve---distance^2: 1499.5
> Iteration 1---distance^2: 3.114789
> Iteration 2---distance^2: 10.04492
> Iteration 3---distance^2: 11.89215
> Iteration 4---distance^2: 12.43235
> Iteration 5---distance^2: 12.68524
> Iteration 6---distance^2: 12.84443
> Iteration 7---distance^2: 12.93624
> Iteration 8---distance^2: 12.99118
> Iteration 9---distance^2: 13.01280
> Iteration 10---distance^2: 13.02867
> Iteration 11---distance^2: 13.03867
> >
> 
> You see that the projection distance is minimal at iteration 1, but
> the algorithm settles for an inferior projection (i..e. a greater
> projection distance).

I hadn't quite noticed that - I was too busy watching the algorithm
converge to some solution instead.

Hastie and Stuetzle make no claims about the algorithm converging or
indeed even that each step leads to a decrease in the criterion (sum of
squared distances to the curve).

It would appear that a smoother curve than the one identified in iter 1
is preferable when one considers the self-consistency check; smoothing
in each dimension using the distance along the current curve as a
predictor performs local averaging and the new principal curve
identified starts cutting the corners off the sine wave. gradually the
new curve derived by local averaging and the current curve are
sufficiently similar that the curve is declared self-consistent and the
algorithm stops.

I can achieve a better fit using my own wrappers around
principal.curve() and the smooth.spline() function used in the local
averaging, but only if I supply some good starting values. For my
wrapper and the starting configuration I suggested, I get a very close
solution to the sine wave original but it is a bit smoother - the rapid
change around the peak/troughs is smoothed out a little bit. GCV is
being used at each iteration in my wrappers to find the optimal
smoothness for smooths in both dimensions that describe the curve; the
solution fits the data almost as well as the best curve, but uses fewer
degrees of freedom in doing so.

As Henrik suggests, playing with the parameters of the smooth.spline
fitting engine helps - as that is the main thing my wrapper function
makes easier.

These data seem quite difficult for the algorithm to get good answers -
starting from the first or second principal component gives poor results
and a highly complex fitted curve that traverses back and forth across
the gaps between the peaks/troughs in the sine wave several times.

G

> If I do not provide any starting values, I get this:
> 
> > fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100)
> Starting curve---distance^2: 29692.03
> Iteration 1---distance^2: 20.31220
> Iteration 2---distance^2: 19.45939
> Iteration 3---distance^2: 19.26387
> Iteration 4---distance^2: 19.20626
> Iteration 5---distance^2: 19.18666
> Iteration 6---distance^2: 19.18059
> >
> 
> This is even worse.  It seems like the algorithm is quite sensitive to starting values.  Is this behavior expected or is there some flaw in the algorithm?
> 
> Ravi. 
> 
> ________________________________________
> From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Gavin Simpson [gavin.simpson at ucl.ac.uk]
> Sent: Monday, May 23, 2011 8:27 AM
> To: guy33
> Cc: r-help at r-project.org
> Subject: Re: [R] Problem with Princurve
> 
> On Thu, 2011-05-19 at 06:43 -0700, guy33 wrote:
> > Hey all,
> >
> > I can't seem to get the princurve package to produce correct results, even
> > in the simplest cases.  For example, if you just generate a 1 period
> > noiseless sine wave, and ask for the principal curve and plot, the returned
> > curve is clearly wrong (doesn't follow the sine wave).  Here's my code:
> >
> > library(princurve)
> > x <- runif(1000,0,2*pi); x <- cbind(x/(2*pi), sin(x))
> > fit1 <- principal.curve(x, plot = TRUE)
> >
> > Anyone have any suggestions?  If you run this code, do you get the correct
> > principal curve?
> 
> How about specifying some useful starting points?
> 
> fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100,
>                         start = cbind(sort(x[,1]), rep(1, nrow(x))))
> 
> And we need a few more iterations before convergence here. Starting from
> the first principal component for example might give useful starting
> points.
> 
> HTH
> 
> G
> 
> > Any help would be really appreciated!
> > -guy33
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3535721.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list