[R] highly biased PCA data?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Nov 4 19:26:54 CET 2004


On Thu, 4 Nov 2004, Berton Gunter wrote:

>
>Dan:
>
>
>1) There is no guarantee that PCA will show separate groups, of course, as
>that is not its purpose, although it is frequently a side effect.
>
>2) If you were to use a classification method of some sort (discriminant
>analysis, neural nets, SVM's, model=based classification,  ...), my
>understanding is that yes, indeed, severely unbalanced group membership
>would, indeed, affect results. A guess is that Bayesian or other methods
>that could explicitly model the prior membership probabilities would do
>better. To make it clear why, suppose that there was a 99.9% preference of
>"dog" and .05% each of the others. Than your datasets would have almost no
>information on how covariates could distinguish the classes and the best
>classifier would be to call everything a "dog" no matter what values the
>covariates had.
>
>I presume experts will have more and better to say about this.

Sounds interesting. Thanks very much for the input. Just out of curiosity,
given that I can make my data more uniform (less biased), how could I best
generate a 2d plot to encapsulate the clusters (and inter cluster
relationships)?

Actually I am thinking of a 2d density.


>
>-- Bert Gunter
>Genentech Non-Clinical Statistics
>South San Francisco, CA
> 
>"The business of the statistician is to catalyze the scientific learning
>process."  - George E. P. Box
> 
> 
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch 
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dan Bolser
>> Sent: Thursday, November 04, 2004 9:41 AM
>> To: R mailing list
>> Subject: [R] highly biased PCA data?
>> 
>> 
>> Hello, supposing that I have two or three clear categories 
>> for my data,
>> lets say pet preferece across fish, cat, dog. Lets say most 
>> people rate
>> their preference as being mostly one of the categories.
>> 
>> I want to do pca on the data to see three 'groups' of people, 
>> one group
>> for fish, one for cat and one for dog. I would like to see 
>> the odd person
>> who likes both or all three in the (appropriate) middle of 
>> the other main
>> groups.
>> 
>> Will my data be affected by the fact that I have interviewed 1000 dog
>> owners, 100 cat owners and 10 fish owners? (assuming that 
>> each scale of
>> preference has an equal range). 
>> 
>> Cheers,
>> dan.
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>> 
>




More information about the R-help mailing list