[R] Several PCA questions...
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 29 12:36:01 CEST 2004
On Tue, 29 Jun 2004, Dan Bolser wrote:
> Hi, I am doing PCA on several columns of data in a data.frame.
>
> I am interested in particular rows of data which may have a particular
> combination of 'types' of column values (without any pre-conception of
> what they may be).
>
> I do the following...
>
> # My data table.
> allDat <- read.table("big_select_thresh_5", header=1)
>
> # Where some rows look like this...
> # PDB SUNID1 SUNID2 AA CH IPCA PCA IBB BB
> # 3sdh 14984 14985 6 10 24 24 93 116
> # 3hbi 14986 14987 6 10 20 22 94 117
> # 4sdh 14988 14989 6 10 20 20 104 122
>
> # NB First three columns = row ID, last 6 = variables
>
> attach(allDat)
>
> # My columns of interest (variables).
> part <- data.frame(AA,CH,IPCA,PCA,IBB,BB)
>
> pc <- princomp(part)
Do you really want an unscaled PCA on that data set? Looks unlikely (but
then two of the columns are constant in the sample, which is also
worrying).
> plot(pc)
>
> The above plot shows that 95% of the variance is due to the first
> 'Component' (which I assume is AA).
No, it is the first (principal) component. You did ask for P>C<A!
> i.e. All the variables behave in quite much the same way.
Or you failed to scale the data so one dominates.
> I then did ...
>
>
> biplot(pc)
>
> Which showed some outliers with a numeric ID - How do I get back my old 3
> part ID used in allDat?
Set row names on your data frame. Like almost all of R, it is the row
names of a data frame that are used for labelling, and you did not give
any so you got numbers.
> In the above plot I saw all the variables (correctly named) pointing in
> more or less the same direction (as shown by the variance). I then did the
> following...
>
> postscript(file="test.ps",paper="a4")
>
> biplot(pc)
>
> dev.off()
>
> However, looking at test.ps shows that the arrows are missing (using
> ggv)... Hmmm, they come back when I pstoimg then xv... never mind.
So ggv is unreliable, perhaps cannot cope with colours?
> Finally, I would like to make a contour plot of the above biplot, is this
> possible? (or even a good way to present the data?
What do you propose to represent by the contours? Biplots have a
well-defined interpretation in terms of distances and angles.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list