[R] Question about PCA with prcomp
Mark Difford
mark_difford at yahoo.co.uk
Mon Jul 2 23:18:07 CEST 2007
To all ...,
Bill's "lateral" wisdom is almost certainly a better solution. So thanks
for the advice (and everything else that went before it [Bill: apropos of
termplot, what happened to tplot ?]). And I will [almost] desist from
asking the obvious: and if there were 10 000 observations ?
BestR,
Mark.
Bill.Venables wrote:
>
> ...but with 500 variables and only 20 'entities' (observations) you will
> have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how
> many is "a few"?
>
> Everyone who has responded to this seems to accept the idea that PCA is
> the way to go here, but that is not clear to me at all. There is a
> 2-sample structure in the 20 observations that you have. If you simply
> ignore that in doing your PCA you are making strong assumptions about
> sampling that would seem to me unlikely to be met. If you allow for the
> structure and project orthogonal to it then you are probably throwing
> the baby out with the bathwater - you want to choose variables which
> maximise separation between the 2 samples (and now you are up to 482
> zero principal variances, if that matters...).
>
> I think this problem probably needs a bit of a re-think. Some variant
> on singular LDA, for example, may be a more useful way to think about
> it.
>
> Bill Venables.
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ravi Varadhan
> Sent: Monday, 2 July 2007 1:29 PM
> To: 'Patrick Connolly'
> Cc: r-help at stat.math.ethz.ch; 'Mark Difford'
> Subject: Re: [R] Question about PCA with prcomp
>
> The PCs that are associated with the smaller eigenvalues.
>
> ------------------------------------------------------------------------
> ----
> -------
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvaradhan at jhmi.edu
>
> Webpage:
> http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>
>
>
> ------------------------------------------------------------------------
> ----
> --------
>
> -----Original Message-----
> From: Patrick Connolly [mailto:p_connolly at ihug.co.nz]
> Sent: Monday, July 02, 2007 4:23 PM
> To: Ravi Varadhan
> Cc: 'Mark Difford'; r-help at stat.math.ethz.ch
> Subject: Re: [R] Question about PCA with prcomp
>
> On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote:
>
> |> Mark,
> |>
> |> What you are referring to deals with the selection of covariates,
> |> since
> PC
> |> doesn't do dimensionality reduction in the sense of covariate
> selection.
> |> But what Mark is asking for is to identify how much each data point
> |> contributes to individual PCs. I don't think that Mark's query makes
> much
> |> sense, unless he meant to ask: which individuals have high/low scores
>
> |> on PC1/PC2. Here are some comments that may be tangentially related
> |> to
> Mark's
> |> question:
> |>
> |> 1. If one is worried about a few data points contributing heavily to
>
> |> the estimation of PCs, then one can use robust PCA, for example,
> |> using robust covariance matrices. MASS has some tools for this.
> |> 2. The "biplot" for the first 2 PCs can give some insights 3. PCs,
> |> especially, the last few PCs, can be used to identify "outliers".
>
> What is meant by "last few PCs"?
>
> --
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>
> ___ Patrick Connolly
> {~._.~} Great minds discuss ideas
> _( Y )_ Middle minds discuss events
> (:_~*~_:) Small minds discuss people
> (_)-(_) ..... Anon
>
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11402204
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list