[R] highly biased PCA data?
gunter.berton at gene.com
Thu Nov 4 19:08:38 CET 2004
1) There is no guarantee that PCA will show separate groups, of course, as
that is not its purpose, although it is frequently a side effect.
2) If you were to use a classification method of some sort (discriminant
analysis, neural nets, SVM's, model=based classification, ...), my
understanding is that yes, indeed, severely unbalanced group membership
would, indeed, affect results. A guess is that Bayesian or other methods
that could explicitly model the prior membership probabilities would do
better. To make it clear why, suppose that there was a 99.9% preference of
"dog" and .05% each of the others. Than your datasets would have almost no
information on how covariates could distinguish the classes and the best
classifier would be to call everything a "dog" no matter what values the
I presume experts will have more and better to say about this.
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
"The business of the statistician is to catalyze the scientific learning
process." - George E. P. Box
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dan Bolser
> Sent: Thursday, November 04, 2004 9:41 AM
> To: R mailing list
> Subject: [R] highly biased PCA data?
> Hello, supposing that I have two or three clear categories
> for my data,
> lets say pet preferece across fish, cat, dog. Lets say most
> people rate
> their preference as being mostly one of the categories.
> I want to do pca on the data to see three 'groups' of people,
> one group
> for fish, one for cat and one for dog. I would like to see
> the odd person
> who likes both or all three in the (appropriate) middle of
> the other main
> Will my data be affected by the fact that I have interviewed 1000 dog
> owners, 100 cat owners and 10 fish owners? (assuming that
> each scale of
> preference has an equal range).
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide!
More information about the R-help