[BioC] How to plot gene on their chromosome?
Martin Morgan
mtmorgan at fhcrc.org
Wed May 27 18:50:51 CEST 2009
Simon Noël <simon.noel.2 at ulaval.ca> writes:
> Whell, now I got the result I wanted. In R, I use
>
>>library(org.Hs.eg.db)
>
>>mysymbols <- scan("./Gene_list.csv", what = 'character', skip =1)
>>myEgIDs <- unlist(mget(mysymbols, org.Hs.egSYMBOL2EG))
>>write.table(myEgIDs,file="myEgIDs.xls")
>
> After, I use my .xls with the program stripe.
Hi Simon --
You say the summer has just begun, so... Here's something that might
be fun to work with...
I use the lattice package to create the plot. I load the package,
define two 'helper' functions, and then a wrapper to do what I want...
library(lattice)
prepanel.idio <-
function(chrs, ..., xlim)
{
if (missing(xlim))
list(xlim=c(1, max(chrs[["Length"]])))
else
list(xlim=xlim)
}
panel.idio <-
function(x, y, chrs, ..., gtick=.2, colp="blue", colm="red")
{
## helpers: y coordinates at integer values
yy <- unique(y)
o <- order(yy)
yi <- match(y, yy[o])
## draw 'chromosomes'
with(chrs, {
idx <- match(yy, Chromosome)
panel.segments(1, seq_along(yy), Length[idx][o], seq_along(yy))
})
## plot genes
panel.segments(abs(x), yi, abs(x),
ifelse(x<0, yi-gtick, yi+gtick),
..., col=ifelse(x<0, colm, colp))
}
idioplot <-
function(chrs, ..., prepanel=prepanel.idio, panel=panel.idio)
{
xyplot(..., prepanel=prepanel, panel=panel, chrs=chrs)
}
Then I get the data needed to do the plots
library(org.Hs.eg.db)
sym <- c("ACSL1", "ACTG2", "ADAMTSL2", "ADH1C",
"ADORA1", "ADRA2A", "AHNAK", "AIM2", "ALAS2")
id <- unlist(mget(sym, org.Hs.egSYMBOL2EG))
chrloc <- mget(id, org.Hs.egCHRLOC)
chr <- unlist(lapply(chrloc, names), use.names=FALSE)
and form it into the data structures needed to use my functions
genes <- data.frame(Id=rep(id, sapply(chrloc, length)),
Chromosome=factor(chr,
levels=c(1:22, "X", "Y", "M")),
Position=unlist(chrloc, use.names=FALSE))
chrs <- data.frame(Chromosome=factor(
names(org.Hs.egCHRLENGTHS),
levels=c(1:22, "X", "Y", "M")),
Length=org.Hs.egCHRLENGTHS)
Finally I draw the plot
idio <- idioplot(chrs, Chromosome~Position, genes)
show(idio)
I should have access to all the regular plot formating parameters (see
?xyplot, ?panel.xyplot, ?segments, ?panel.segments), in addition to
changing how long the gene markings are, and the colors of the
segments for each gene. You even have 'zoom' capabilities
idioplot(chrs, Chromosome~Position, genes[genes$Chromosome=="11", ],
xlim=c(61000000, 63000000))
or, a little more hack-y (I don't know how to update the y limits when
y is a factor)
update(idio, ylim=c(5.5,6.5), xlim=c(61000000, 63000000))
More generally you might look at rtracklayer (for exporting to the
UCSC genome browser), GenomeGraph (for plotting gene-level
information, for instance), geneplotter, idiogram, ...
Martin
> Thans's you for your help.
>
> Does anyone have an idea about my problem with qc()?
>
> Selon Simon Noël <simon.noel.2 at ulaval.ca>, 23.05.2009:
>
>> I have try your procedure and every thing is loking fine for now. What's the
>> netx step to plot my genes on their chromosomes?
>>
>> Selon Simon Noël <simon.noel.2 at ulaval.ca>, 21.05.2009:
>>
>> > I am sorry. Has I say, I am new to R and bioconductor. I am still
>> learning.
>> >
>> > I will try this tomorow and I will write you back to tell you the result.
>> > Thank's you for your help. Il it's working, I will only have to wory about
>> > my
>> > problem with qc()... Or at least for now. The summer is really young;)
>> > Selon Hervé Pagès <hpages at fhcrc.org>, 21.05.2009:
>> >
>> > > Hi Simon,
>> > >
>> > > Not a good idea to start a new thread by replying to a different thread
>> > > you started previously. Then it shows up under the previous thread even
>> > > if you changed the subject.
>> > >
>> > > more below...
>> > >
>> > > Simon Noël wrote:
>> > > > Hello every one. I have a question. I have a gene list in a .xls like
>> > > >
>> > > > probeID Symbol
>> > > > 1030431 ACSL1
>> > > > 4610431 ACTG2
>> > > > 4810575 ADAMTSL2
>> > > > 1510750 ADH1C
>> > > > 4060519 ADORA1
>> > > > 5720523 ADRA2A
>> > > > 2810482 AHNAK
>> > > > 1260270 AIM2
>> > > > 4180768 ALAS2
>> > > > ... ...
>> > > >
>> > > > I want to plote all of those genes on their chromosome. How can I do
>> > this?
>> > >
>> > > So first you need to map each gene to its chromosome location.
>> > >
>> > > You can use one of the org.*.eg.db annotation packages for
>> > > this (pick up the one for your organism):
>> > >
>> > > http://bioconductor.org/packages/release/data/annotation/
>> > >
>> > > and use the SYMBOL2EG map to map your gene symbols to their corresponding
>> > > Entrez IDs and then the CHRLOC map to map your Entrez IDs to their
>> > chromosome
>> > > locations.
>> > >
>> > > Example:
>> > >
>> > > library(org.Hs.eg.db)
>> > > mysymbols <- c("ACSL1", "ACTG2", "ADAMTSL2", "ADH1C",
>> > > "ADORA1", "ADRA2A", "AHNAK", "AIM2", "ALAS2")
>> > > myEgIDs <- unlist(mget(mysymbols, org.Hs.egSYMBOL2EG))
>> > > mylocs <- unname(unlist(mget(myEgIDs, org.Hs.egCHRLOC)))
>> > >
>> > > One thing to be aware of is that those mappings are not necessarily
>> > > one-to-one e.g. the same symbol can be associated with different genes:
>> > >
>> > > > flat <- toTable(org.Hs.egSYMBOL2EG)
>> > > > names(flat)
>> > > [1] "gene_id" "symbol"
>> > > > any(duplicated(flat$gene_id))
>> > > [1] FALSE
>> > > > any(duplicated(flat$symbol))
>> > > [1] TRUE
>> > >
>> > > The same thing happens with the org.Hs.egCHRLOC map (I'm not sure
>> > > why we have this though, may be others on the list can explain).
>> > >
>> > > Anyway this explains why 'mylocs' can have more elements than
>> 'mysymbols'.
>> > >
>> > > Cheers,
>> > > H.
>> > >
>> > > >
>> > > > Simon Noël
>> > > > VP Externe CADEUL
>> > > > Association des étudiants et étudiantes en Biochimie, Bio-
>> > > > informatique et Microbiologie de l'Université Laval
>> > > > CdeC
>> > > >
>> > > > _______________________________________________
>> > > > Bioconductor mailing list
>> > > > Bioconductor at stat.math.ethz.ch
>> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > > Search the archives:
>> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >
>> > > --
>> > > Hervé Pagès
>> > >
>> > > Program in Computational Biology
>> > > Division of Public Health Sciences
>> > > Fred Hutchinson Cancer Research Center
>> > > 1100 Fairview Ave. N, M2-B876
>> > > P.O. Box 19024
>> > > Seattle, WA 98109-1024
>> > >
>> > > E-mail: hpages at fhcrc.org
>> > > Phone: (206) 667-5791
>> > > Fax: (206) 667-1319
>> > >
>> > >
>> >
>> >
>> > Simon Noël
>> > VP Externe CADEUL
>> > Association des étudiants et étudiantes en Biochimie, Bio-
>> > informatique et Microbiologie de l'Université Laval
>> > CdeC
>>
>>
>> Simon Noël
>> VP Externe CADEUL
>> Association des étudiants et étudiantes en Biochimie, Bio-
>> informatique et Microbiologie de l'Université Laval
>> CdeC
>
>
> Simon Noël
> VP Externe CADEUL
> Association des étudiants et étudiantes en Biochimie, Bio-
> informatique et Microbiologie de l'Université Laval
> CdeC
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list