[R] Defining plot colors based on a variable

hadley wickham h.wickham at gmail.com
Mon Feb 2 16:10:42 CET 2009


On Mon, Feb 2, 2009 at 8:56 AM, Andrew Singleton <singleta at mail.nih.gov> wrote:
> Hi, I have been trying unsuccessfully to plot data using different colors
> based on a variable within a subset of an imported file. The file I am
> reading is about 20000 lines long and has a column (in the example called
> FILE) that contains approximately 100 unique entries. I would like to plot a
> subset of the data from the file and key the color from the FILE column,
> This is what my file looks like :
>
> CHR          SNP         BP    NMISS       BETA         SE         R2
> T            P    REGION    FILE    RANDOM
>   1  rs17035189   10519610      135     0.3518      1.928  0.0002501
> 0.1824       0.8555     TCTX    4730341    0.284627081
>   6   rs3763311   32484154      109      -2.05      1.624    0.01467
> -1.262       0.2096     TCTX    670603    0.083147673
>   6   rs3892710   32790839      106     0.5695      4.743  0.0001386
> 0.1201       0.9047     TCTX    7150403    0.549192815
>   6   rs3864300   32379785      102      9.208      6.416    0.02018
> 1.435       0.1544     TCTX    7210017    0.837265988
>   6   rs6912002   32873245       13     -1.295      5.043   0.005963
> -0.2569        0.802     TCTX    2710441    0.170566699
>   5    rs4024109   35955374        9      26.19      31.01    0.09245
> 0.8444       0.4263     TCTX    2650653    0.298573497
>   6   rs3129719   32769757       16      10.35       7.44     0.1215
> 1.391       0.1859     TCTX    2900504    0.378538235
>   6    rs476885   32402690      109   -0.09378      1.552  3.411e-05
> -0.06041       0.9519     TCTX    670603    0.017970964
>  10   rs12570766    5602540      139     0.6182       6.66  6.289e-05
> 0.09283       0.9262     TCTX    4560767    0.004973939
> etc
>
>
> And this is the code that I have:
>
> assoc_data <- read.table("master.out", header =TRUE)
> par(fig=c(0, 10, 0,  10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2)
> attach(assoc_data)
> curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these
> criteria change based on input from another file
>
> #count the number of transcripts
> transcripts <- length(unique(curr_assoc$FILE))
>
> #generate that number of unique ³FILE² entries in my subset of data
> my_colors <- rainbow(transcripts)
>
> plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20,
> col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA,
> cex=0.7, cex.lab=2)
> detach(assoc_data)

You might find it easier to use ggplot2:

install.packages("ggplot2")
library(ggplot2)

qplot(BP, P, data = curr_assoc, colour = FILE, log="y")

To ensure that you always have the same colours, you can set the
limits for the colour scale (in analogous way to setting the limits
for the x axis):

qplot(BP, P, data = curr_assoc, colour = FILE, log="y") +
scale_colour_hue(limits = c(2, 7, 12, 34, 60, 64, 65, 70, 71))

Hadley

-- 
http://had.co.nz/




More information about the R-help mailing list