[R] highly biased PCA data?

Berton Gunter gunter.berton at gene.com
Thu Nov 4 19:08:38 CET 2004


Dan:


1) There is no guarantee that PCA will show separate groups, of course, as
that is not its purpose, although it is frequently a side effect.

2) If you were to use a classification method of some sort (discriminant
analysis, neural nets, SVM's, model=based classification,  ...), my
understanding is that yes, indeed, severely unbalanced group membership
would, indeed, affect results. A guess is that Bayesian or other methods
that could explicitly model the prior membership probabilities would do
better. To make it clear why, suppose that there was a 99.9% preference of
"dog" and .05% each of the others. Than your datasets would have almost no
information on how covariates could distinguish the classes and the best
classifier would be to call everything a "dog" no matter what values the
covariates had.

I presume experts will have more and better to say about this.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dan Bolser
> Sent: Thursday, November 04, 2004 9:41 AM
> To: R mailing list
> Subject: [R] highly biased PCA data?
> 
> 
> Hello, supposing that I have two or three clear categories 
> for my data,
> lets say pet preferece across fish, cat, dog. Lets say most 
> people rate
> their preference as being mostly one of the categories.
> 
> I want to do pca on the data to see three 'groups' of people, 
> one group
> for fish, one for cat and one for dog. I would like to see 
> the odd person
> who likes both or all three in the (appropriate) middle of 
> the other main
> groups.
> 
> Will my data be affected by the fact that I have interviewed 1000 dog
> owners, 100 cat owners and 10 fish owners? (assuming that 
> each scale of
> preference has an equal range). 
> 
> Cheers,
> dan.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list