[R] Question about PCA with prcomp

Mon Jul 2 21:16:26 CEST 2007

Mark,

What you are referring to deals with the selection of covariates, since PC
doesn't do dimensionality reduction in the sense of covariate selection.
But what Mark is asking for is to identify how much each data point
contributes to individual PCs.  I don't think that Mark's query makes much
sense, unless he meant to ask: which individuals have high/low scores on
PC1/PC2.  Here are some comments that may be tangentially related to Mark's
question:

1.  If one is worried about a few data points contributing heavily to the
estimation of PCs, then one can use robust PCA, for example, using robust
covariance matrices.  MASS has some tools for this.
2.  The "biplot" for the first 2 PCs can give some insights
3. PCs, especially, the last few PCs, can be used to identify "outliers".

Hope this is helpful,
Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

----------------------------------------------------------------------------
--------

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mark Difford
Sent: Monday, July 02, 2007 1:55 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] Question about PCA with prcomp

Hi James,

Have a look at Cadima et al.'s subselect package [Cadima worked with/was a
student of Prof Jolliffe, one of _the_ experts on PCA; Jolliffe devotes part
of a Chapter to this question in his text (Principal Component Analysis,
pub. Springer)].  Then you should look at psychometric stuff: a good place
to start would be Professor Revelle's psych package.

BestR,
Mark.

James R. Graham wrote:
> 
> Hello All,
> 
> The basic premise of what I want to do is the following:
> 
> I have 20 "entities" for which I have ~500 measurements each. So, I  
> have a matrix of 20 rows by ~500 columns.
> 
> The 20 entities fall into two classes: "good" and "bad."
> 
> I eventually would like to derive a model that would then be able to  
> classify new entities as being in "good territory" or "bad territory"  
> based upon my existing data set.
> 
> I know that not all ~500 measurements are meaningful, so I thought  
> the best place to begin would be to do a PCA in order to reduce the  
> amount of data with which I have to work.
> 
> I did this using the prcomp function and found that nearly 90% of the  
> variance in the data is explained by PC1 and 2.
> 
> So far, so good.
> 
> I would now like to find out which of the original ~500 measurements  
> contribute to PC1 and 2 and by how much.
> 
> Any tips would be greatly appreciated! And apologies in advance if  
> this turns out to be an idiotic question.
> 
> 
> james
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context:
http://www.nabble.com/Question-about-PCA-with-prcomp-tf4012919.html#a1139860
8
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.