[R] Several PCA questions...

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue Jun 29 12:04:18 CEST 2004


Hi, I am doing PCA on several columns of data in a data.frame.

I am interested in particular rows of data which may have a particular
combination of 'types' of column values (without any pre-conception of
what they may be).

I do the following...

# My data table.
allDat <- read.table("big_select_thresh_5", header=1)

# Where some rows look like this...
# PDB     SUNID1  SUNID2  AA      CH      IPCA    PCA     IBB     BB
# 3sdh    14984   14985   6       10      24      24      93      116
# 3hbi    14986   14987   6       10      20      22      94      117
# 4sdh    14988   14989   6       10      20      20      104     122

# NB First three columns = row ID, last 6 = variables

attach(allDat)

# My columns of interest (variables).
part <- data.frame(AA,CH,IPCA,PCA,IBB,BB)

pc <- princomp(part)

plot(pc)

The above plot shows that 95% of the variance is due to the first
'Component' (which I assume is AA).

i.e. All the variables behave in quite much the same way.

I then did ...


biplot(pc)

Which showed some outliers with a numeric ID - How do I get back my old 3
part ID used in allDat?

In the above plot I saw all the variables (correctly named) pointing in
more or less the same direction (as shown by the variance). I then did the
following...

postscript(file="test.ps",paper="a4")

biplot(pc)

dev.off()

However, looking at test.ps shows that the arrows are missing (using
ggv)... Hmmm, they come back when I pstoimg then xv... never mind.


Finally, I would like to make a contour plot of the above biplot, is this
possible? (or even a good way to present the data?

Thanks very much for any feedback, 

Dan.




More information about the R-help mailing list