[BioC] Building the tomato annotation library(Affy)
James W. MacDonald
jmacdon at uw.edu
Tue Dec 11 17:11:34 CET 2012
Hi Jorge,
On 12/10/2012 12:00 PM, Jorge Mena-Ali wrote:
> I'm trying to obtain the annotation file for the Affy tomato chip. Any
> suggestions on specific code to append this file to the eset will be
> appreciated.
There are two general ways to handle this situation (that I know of).
1.) Just use the Affy annotation file directly.
2.) Build an org package and then use say UniGene or Gene IDs from the
annotation file to map things.
For #1, you can download the csv file from Affy and do something like
> dat <- read.csv("Tomato.na33.annot.csv", header = TRUE, skip = 13,
na.string = "---")
> names(dat)
[1] "Probe.Set.ID" "GeneChip.Array"
[3] "Species.Scientific.Name" "Annotation.Date"
[5] "Sequence.Type" "Sequence.Source"
[7] "Transcript.ID.Array.Design." "Target.Description"
[9] "Representative.Public.ID" "Archival.UniGene.Cluster"
[11] "UniGene.ID" "Genome.Version"
[13] "Alignments" "Gene.Title"
[15] "Gene.Symbol" "Chromosomal.Location"
[17] "Unigene.Cluster.Type" "Ensembl"
[19] "Entrez.Gene" "SwissProt"
[21] "EC" "OMIM"
[23] "RefSeq.Protein.ID" "RefSeq.Transcript.ID"
[25] "FlyBase" "AGI"
[27] "WormBase" "MGI.Name"
[29] "RGD.Name" "SGD.accession.number"
[31] "Gene.Ontology.Biological.Process" "Gene.Ontology.Cellular.Component"
[33] "Gene.Ontology.Molecular.Function" "Pathway"
[35] "InterPro" "Trans.Membrane"
[37] "QTL" "Annotation.Description"
[39] "Annotation.Transcript.Cluster" "Transcript.Assignments"
[41] "Annotation.Notes"
and then you can use the existing functions in R to merge() (<- and that
is a hint right there) the set of significant (or not) probesets with
various annotations.
However, the Affy annotations are static as to the build date, and may
be pretty stale by the time you get to them. You can always go to NCBI
and build your own organism-level package, and use that to do the
annotations.
> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.0.1", author = "me", maintainer =
"me <me at mine.org>", outputDir = ".", tax_id = 4081, genus = "Solanum",
species = "lycopersicum")
Loading required package: GO.db
Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
<other blahblahblah snipped>
Creating package in ./org.Slycopersicum.eg.db
[1] TRUE
So after waiting a while, I get this message telling me a package has
been made. And now I need to install.
> install.packages("org.Slycopersicum.eg.db", repos = NULL, type =
"source")
* installing *source* package org.Slycopersicum.eg.db ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (org.Slycopersicum.eg.db)
Now you can use this package to annotate things:
> x <- as.character(sample(dat$UniGene.ID[!is.na(dat$UniGene.ID)], 25))
> select(org.Slycopersicum.eg.db, x, c("SYMBOL","GENENAME"), "UNIGENE")
UNIGENE SYMBOL GENENAME
1 Les.20210 <NA> <NA>
2 Les.11435 <NA> <NA>
3 Les.12414 <NA> <NA>
4 Les.17835 SNF1 SNF1 protein
5 Les.1796 <NA> <NA>
6 --- <NA> <NA>
7 Les.1268 MKP1 MAP kinase phosphatase
8 Les.7575 <NA> <NA>
9 Les.7326 <NA> <NA>
<snip>
Best,
Jim
>
>
>
> Jorge
>
>
>
>
>
> ****************************
>
> Jorge Mena-Ali, PhD
>
> Visiting Assistant Professor
>
> Dept of Biology, Franklin& Marshall College
>
> Lancaster PA 17604
>
> ****************************
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list