[R] princomp

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Tue May 14 23:02:50 CEST 2002


On Tue, 14 May 2002, Kolling Alfons, F+E wrote:

>
> Hello experts,
>
> as newcomer in pca, i have a question, concerning the princomp algorithm.
> With a dataset "r" containing 18 "input" parameters and 1 "output" parameter
> r[19], i got with the following fit
>
> 	ls <- lsfit(r[1:18],r[19]); lsdiag <- ls.diag(ls); lsdiag$std.dev
>
> a prediction error of:
> 	[1] 8.879561
>
> what is quite reasonable. If i take only two significant important inputs,
>
> 	ls <- lsfit(r[1:2],r[19]); lsdiag <- ls.diag(ls); lsdiag$std.dev
>
> i will get an prediction error of:
> 	[1] 20.18148
>
> what is not so bad for only two of 18 input parameters. If i made an lsfit
> with the scores of:
>
> 	p <- princomp(r[1:18],cor=TRUE)
> 	ls <- lsfit(p$scores[,1:18],r[19]); lsdiag <- ls.diag(ls);
> lsdiag$std.dev
> i got the reasonable error of:
> 	[1] 8.879561
> (see above the first fit)
> But (and here comes the question) if take the two most important principal
> components for the lsfit
>
> 	ls <- lsfit(p$scores[,1:2],r[19]); lsdiag <- ls.diag(ls);
> lsdiag$std.dev
> i have an prediction error of:
> 	[1] 33.22741
>
> which is a good deal worse, compared to the 20.18148 from above. So what is
> wrong? I thought, that the first principle components are the "most
> important"?

Your understanding.  The first two PCs explain most of the variance in X,
but they do not explain most of the variation in y.

BTW, it is `principal' not `principle'.

Principal components regression is a big topic.  Ridge regression is
almost always preferable (and there is code for it in package MASS).
See

@Article{Frank.Friedman.93,
  author =       "I. E. Frank and J. H. Friedman",
  title =        "A statistical view of some chemometrics regression
                 tools (with discussion)",
  journal =      "Technometrics",
  volume =       "35",
  pages =        "109--148",
  year =         "1993",
}



-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list