[R] Newbie question regarding graphing of Princomp object
Tobias Verbeke
tobias.verbeke at telenet.be
Sat Jan 15 09:47:06 CET 2005
On Sat, 15 Jan 2005 05:39:00 +0100
List account <lists at norvelle.org> wrote:
> Greetings,
>
> I am working on a stylometric analysis of some latin texts; one of the
> latest stylometric techniques involves using principal components
> analysis. Not being a statistician, I can't really fully rely on PCA
> as my primary tool, since I don't really understand the statistics
> behind the PCA technique. Nevertheless, the ability to use PCA and
> graph the results has been marvelously helpful as a preliminary
> technique to determine what kinds of stylometric variables are worth
> pursuing as indicators of authorship.
>
> For instance, I'm doing the following... I have a set of data for
> approximately 120 different latin works, about half of which are by St.
> Thomas Aquinas, and the other half are by various other authors in the
> Thomistic tradition, some known and some anonymous. My data for
> frequencies of prepositions looks like the following:
>
> A,AD,CIRCA,CUM,DE, .... (total of 10 variables)
> 1,0.00967667222531036,0.0208124884194923,0.00142671854734112,0.004863813
> 22957198,0.00758291643505651 ...
> 2,0.00874917700292081,0.0217315416668508,0.00133005165549453,0.004379007
> 27772451,0.00537323193714733 ....
> 3,0.0064258378627327,0.0280901956627422,0.00178739176045295,0.0043058230
> 9573329,0.00821688482105979 ....
> 4,0.00706850368364528,0.027446604903448,0.000821141574836712,0.004617615
> 47172807,0.00812783899774761 ....
> 5,0.010214039424891,0.015409971157808,0.000745993537614122,0.00584650749
> 246416,0.00475787738815518 ....
> 6,0.00952534711010655,0.0180981595092025,0.00125928317726832,0.005150145
> 30190507,0.00447206974491443 ...
> .... (and so on for the rest of the 120 works)
>
> The works are numbered such that works 100 and below are by St. Thomas,
> those from 101 to 117 are of dubious authenticity, and those from 118
> to 179 are by other authors.
>
> When I perform a biplot, on the results of the princomp() function, I
> get a nice graph that plots the 120 works on the two principal
> component axes (I've figured out how to get rid of the red arrows
> already). Given that the data points tend to jumble together, I'd like
> some way to color the different categories of works in the biplot, so
> that data points for works 1-100 are red, those from 101-117 are blue,
> and those from 118 to 179 are green (for instance).
You can use the `col' argument in the biplot call. In this case, I
would do something like
biplot(mydata, col = c(rep("red", 100), rep("blue", 17), rep("green", 62)))
For a list of built-in color names, you can type colors() at the R prompt.
For more information on biplot, type ?biplot
VaRiis modis bene fit.
HTH,
Tobias
> I've included a sample of the output that I'm currently getting, in
> case it's helpful to anybody. BTW, I am running RAqua (for the Mac),
> version 1.8.1.
>
> Thanks in advance for any help!
>
> -Erik Norvelle
> erik (at) norvelle (dot) org
> Facultad de Filosofía y Letras
> Universidad de Navarra
> Pamplona, Navarra, España
>
>
More information about the R-help
mailing list