[BioC] Oligo package annotation
James W. MacDonald
jmacdon at uw.edu
Fri Dec 14 15:58:48 CET 2012
Please don't take conversations off-list. We like to think of the list
archives as a repository of information.
On 12/14/2012 5:45 AM, Bruno Giotti wrote:
> Ok thanks, but what should i do to query the pd.hugene.1.1.st.v1
> annotation pack and retrieving some useful IDs? I could use the
> package you suggested me but i'd like to first understand how to use
> this one ( pd.hugene.1.1.st.v1).
The pd.hugene.1.1.st.v1 package is NOT an annotation package. Instead,
it maps the locations of probes on the array to different probesets.
This package is used by oligo to decide which probes go into which
probeset, so you can summarize at different levels (e.g., for the HuGene
arrays, at the 'probeset' level, which is roughly exon-level, or at the
transcript level).
Unless you have a real need to know where things are on the chip, the
pd.hugene.1.1.st.v1 package is not of much use. Well, let me take that
back. I have found that the intronic background controls have a really
bad habit of popping up in lists of differentially expressed genes.
There are any number of hypotheses that I can come up with that would
explain why this is so, but in the end I haven't found any end users who
really care. So I use the pd.hugene.1.1.st.v1 package to figure out
which probesets are not controls, and exclude them prior to selecting
differentially expressed genes. The getMainProbes() function in
affycoretools is useful in this respect.
So back to the story at hand. Since the pd.hugene.1.1.st.v1 package
doesn't do annotations, you need to use the
hugene11sttranscriptcluster.db package. It does use a SQLite database as
its backend, but unless you like to do SQL queries this is of no relevance.
The canonical reference for using these annotation packages is the Intro
to Annotation Packages, which can be accessed by
library(hugene11sttranscriptcluster.db)
openVignette()
and then choosing
AnnotationDbi - AnnotationDbi: Introduction To Bioconductor Annotation
Packages
if you care about the internals, you can read
AnnotationDbi - How to use bimaps from the ".db" annotation packages
And if you just want to create annotated output, take a look at the
annaffy package, which automates these things.
Best,
Jim
> Thaniks again
>
> > Date: Thu, 13 Dec 2012 11:53:52 -0500
> > From: jmacdon at uw.edu
> > To: guest at bioconductor.org
> > CC: bioconductor at r-project.org; latini18 at hotmail.com;
> Benilton.Carvalho at cancer.org.uk
> > Subject: Re: [BioC] Oligo package annotation
> >
> >
> >
> > On 12/13/2012 11:48 AM, Bruno [guest] wrote:
> > > Hi all,
> > > My question is quite straight-forward: how do i retrieve EntrezId
> or geneSymbol for pd.hugene.1.1.st.v1 to merge into my gene expression
> matrix? I havent found any vignettes explaining this. I know that the
> annotation file is a SQLite DB which i have to query. However im
> failing to find the tables i need. Sorry if i persevere in not
> explaining myself enough.
> >
> > It depends on what level you used for summarization. Assuming that you
> > used transcript-level summarization (which I would highly recommend),
> > you want to use the hugene11sttranscriptcluster.db package. If you did
> > something like
> >
> > rma(<filename>, target="probeset")
> >
> > then you want the hugene11stprobeset.db
> >
> > Best,
> >
> > Jim
> >
> >
> > >
> > >
> > > -- output of sessionInfo():
> > >
> > > R version 2.15.1 (2012-06-22)
> > > Platform: x86_64-pc-mingw32/x64 (64-bit)
> > >
> > > locale:
> > > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> > > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
> > >
> > > attached base packages:
> > > [1] stats graphics grDevices utils datasets methods base
> > >
> > > other attached packages:
> > > [1] pd.hugene.1.1.st.v1_3.8.0 oligo_1.22.0 affyPLM_1.34.0
> preprocessCore_1.20.0 latticeExtra_0.6-24
> > > [6] lattice_0.20-10 RColorBrewer_1.0-5 BiocInstaller_1.8.3
> simpleaffy_2.34.0 gcrma_2.30.0
> > > [11] genefilter_1.40.0 affy_1.36.0 limma_3.14.3 RSQLite_0.11.2
> DBI_0.2-5
> > > [16] Biobase_2.18.0 oligoClasses_1.20.0 BiocGenerics_0.4.0
> > >
> > > loaded via a namespace (and not attached):
> > > Error in x[["Version"]] : subscript out of bounds
> > > In addition: Warning message:
> > > In FUN(c("affxparser", "affyio", "annotate", "AnnotationDbi",
> "Biostrings", :
> > > DESCRIPTION file of package 'survival' is missing or broken
> > >
> > > --
> > > Sent via the guest posting facility at bioconductor.org.
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list