[R] Several PCA questions...

Jonathan Baron baron at psych.upenn.edu
Tue Jun 29 12:33:29 CEST 2004


On 06/29/04 11:04, Dan Bolser wrote:
>
>Hi, I am doing PCA on several columns of data in a data.frame.
>
>I am interested in particular rows of data which may have a particular
>combination of 'types' of column values (without any pre-conception of
>what they may be).
>
>I do the following...
>
># My data table.
>allDat <- read.table("big_select_thresh_5", header=1)
>
># Where some rows look like this...
># PDB     SUNID1  SUNID2  AA      CH      IPCA    PCA     IBB     BB
># 3sdh    14984   14985   6       10      24      24      93      116
># 3hbi    14986   14987   6       10      20      22      94      117
># 4sdh    14988   14989   6       10      20      20      104     122
>
># NB First three columns = row ID, last 6 = variables
>
>attach(allDat)
>
># My columns of interest (variables).
>part <- data.frame(AA,CH,IPCA,PCA,IBB,BB)
>
>pc <- princomp(part)
>
>plot(pc)
>
>The above plot shows that 95% of the variance is due to the first
>'Component' (which I assume is AA).

No.  It is the first principal component, which is some linear
combination of all the variables.  Try loadings(pc).  It sounds
like you need to read up on principal component analysis.

>i.e. All the variables behave in quite much the same way.
>
>I then did ...
>
>
>biplot(pc)
>
>Which showed some outliers with a numeric ID - How do I get back my old 3
>part ID used in allDat?

The numeric ID is taken from the row names of pc.  So, if the IDs
in question are 3 and 5, then alldat[c(3,5),] should work.

>In the above plot I saw all the variables (correctly named) pointing in
>more or less the same direction (as shown by the variance). I then did the
>following...
>
>postscript(file="test.ps",paper="a4")
>
>biplot(pc)
>
>dev.off()
>
>However, looking at test.ps shows that the arrows are missing (using
>ggv)... Hmmm, they come back when I pstoimg then xv... never mind.

I get red arrows for the components in both the original graph
and the ps output (R 1.9.1, Fedora Core 2).  This may be a
platform-specific problem or one specific to ggv.  I have neither
ggv nor pstoimg.  (But xv and gv both work.)

>Finally, I would like to make a contour plot of the above biplot, is this
>possible? (or even a good way to present the data?

No idea how to do this or why you would want it.

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:            http://www.sas.upenn.edu/~baron
R search page:        http://finzi.psych.upenn.edu/




More information about the R-help mailing list