[R] Question about PCA with prcomp

Mark Difford mark_difford at yahoo.co.uk
Mon Jul 2 23:18:07 CEST 2007


To all ...,

Bill's "lateral" wisdom is almost certainly a better solution.  So thanks
for the advice (and everything else that went before it [Bill: apropos of
termplot, what happened to tplot ?]).  And I will [almost] desist from
asking the obvious: and if there were 10 000 observations ?

BestR,
Mark.


Bill.Venables wrote:
> 
> ...but with 500 variables and only 20 'entities' (observations) you will
> have 481 PCs with dead zero eigenvalues.  How small is 'smaller' and how
> many is "a few"?
> 
> Everyone who has responded to this seems to accept the idea that PCA is
> the way to go here, but that is not clear to me at all.  There is a
> 2-sample structure in the 20 observations that you have.  If you simply
> ignore that in doing your PCA you are making strong assumptions about
> sampling that would seem to me unlikely to be met.  If you allow for the
> structure and project orthogonal to it then you are probably throwing
> the baby out with the bathwater - you want to choose variables which
> maximise separation between the 2 samples (and now you are up to 482
> zero principal variances, if that matters...).
> 
> I think this problem probably needs a bit of a re-think.  Some variant
> on singular LDA, for example, may be a more useful way to think about
> it.
> 
> Bill Venables.  
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ravi Varadhan
> Sent: Monday, 2 July 2007 1:29 PM
> To: 'Patrick Connolly'
> Cc: r-help at stat.math.ethz.ch; 'Mark Difford'
> Subject: Re: [R] Question about PCA with prcomp
> 
> The PCs that are associated with the smaller eigenvalues. 
> 
> ------------------------------------------------------------------------
> ----
> -------
> 
> Ravi Varadhan, Ph.D.
> 
> Assistant Professor, The Center on Aging and Health
> 
> Division of Geriatric Medicine and Gerontology 
> 
> Johns Hopkins University
> 
> Ph: (410) 502-2619
> 
> Fax: (410) 614-9625
> 
> Email: rvaradhan at jhmi.edu
> 
> Webpage:
> http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
> 
>  
> 
> ------------------------------------------------------------------------
> ----
> --------
> 
> -----Original Message-----
> From: Patrick Connolly [mailto:p_connolly at ihug.co.nz]
> Sent: Monday, July 02, 2007 4:23 PM
> To: Ravi Varadhan
> Cc: 'Mark Difford'; r-help at stat.math.ethz.ch
> Subject: Re: [R] Question about PCA with prcomp
> 
> On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote:
> 
> |> Mark,
> |> 
> |> What you are referring to deals with the selection of covariates, 
> |> since
> PC
> |> doesn't do dimensionality reduction in the sense of covariate
> selection.
> |> But what Mark is asking for is to identify how much each data point 
> |> contributes to individual PCs.  I don't think that Mark's query makes
> much
> |> sense, unless he meant to ask: which individuals have high/low scores
> 
> |> on PC1/PC2.  Here are some comments that may be tangentially related 
> |> to
> Mark's
> |> question:
> |> 
> |> 1.  If one is worried about a few data points contributing heavily to
> 
> |> the estimation of PCs, then one can use robust PCA, for example, 
> |> using robust covariance matrices.  MASS has some tools for this.
> |> 2.  The "biplot" for the first 2 PCs can give some insights 3. PCs, 
> |> especially, the last few PCs, can be used to identify "outliers".
> 
> What is meant by "last few PCs"?
> 
> -- 
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
> 
>    ___    Patrick Connolly   
>  {~._.~}          		 Great minds discuss ideas    
>  _( Y )_  	  	        Middle minds discuss events 
> (:_~*~_:) 	       		 Small minds discuss people  
>  (_)-(_)  	                           ..... Anon
> 	  
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a11402204
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list