[R] highly biased PCA data?

Gabor Grothendieck ggrothendieck at myway.com
Thu Nov 4 19:33:40 CET 2004


Dan Bolser <dmb <at> mrc-dunn.cam.ac.uk> writes:

: 
: Hello, supposing that I have two or three clear categories for my data,
: lets say pet preferece across fish, cat, dog. Lets say most people rate
: their preference as being mostly one of the categories.
: 
: I want to do pca on the data to see three 'groups' of people, one group
: for fish, one for cat and one for dog. I would like to see the odd person
: who likes both or all three in the (appropriate) middle of the other main
: groups.
: 
: Will my data be affected by the fact that I have interviewed 1000 dog
: owners, 100 cat owners and 10 fish owners? (assuming that each scale of
: preference has an equal range). 

This is not PCA but randomForest has facilities for handling
classifications where the number of points vary widely.  See the 
help for randomForest and the sampsize= argument, in particular.  
Also see R News 2/3 and http://www.stat.berkeley.edu/users/chenchao/666.pdf




More information about the R-help mailing list