[BioC] Help with annotation packages
Amy Mikhail
a.mikhail at abdn.ac.uk
Tue Mar 28 17:34:09 CEST 2006
Dear list,
I have a mosquito microarray that I would like to annotate, but am having
some trouble figuring out which packages are appropriate to use. After
reading the Annbuilder, Annotate and BiomaRt vignettes, I am still really
unsure if any of those packages would do what I want. So here is my
question:
The array is for Anopheles gambiae, and consists of about 13,500 cDNA
spots from PCR plates - probe sequences between 150 and 500 bp in length.
The manufacturer of my array provided a .GAL file with it - this was made
in GenePix and lists ensembl gene transcripts under the column "name" and
ensembl gene identifiers under the column "ID".
What I would really like is to add an extra column to this .GAL file (or
actually my .gpr outputs from GenePix) which would contain gene
function/ontology information, so that everything I do with my results
thereafter would come up with the GO information as well (e.g. toptable
from limma).
I know that the latest An. gambiae annotation available in Ensembl is
agam_P3, and would like to use this but have to bear in mind that the
microarray probe IDs were provided from an earlier build, so a number of
genes on the array will not be present in the agam_P3 list . If the
package I use flags these as NAs or whatever, that would be fine for the
moment.
My confusion is really over which package to use:
I understand that Biomart can handle single queries or queries for a small
list of (e.g.) DE genes, but not the entire probe set. Is that right?
Also, I note that other list users with queries relating to Biomart have
been directed to use the devel version. I think this doesn't work with R
2.2.1?
It also seems that the Annotate package is only suitable for species that
Bioconductor has specifically created libraries for, and that there are
currently only libraries for human, mouse and rat ... so not suitable for
me either?
Lastly, the Annbuilder package sounds most like what I'm after, but I'm a
bit confused about whether it is limited in the public data repositories
it can use, as the probe IDs I have are from Ensembl, not Entrez-gene.
Also I gather I would have to query the data package that Annbuilder
creates every time I want the annotation info for a given list of genes,
rather than it being linked to my .gpr or .GAL files. Have I understood
that correctly, and if so is there any way to link annotation info to the
.GAL file itself? Also is Pearl something one has to download in order to
use this package (please excuse the very naive question as I'm not a
bioinformatician)?
So just to recap; all I actually want to do is merge the AGAM P3
annotation list with my .GAL file, and make sure that the new columns
appear as part of the output from limma, etc.
Looking forward to any advice / suggestions,
Regards,
Amy
R: 2.2.1, Bioconductor: 1.7, OS: windows XP.
-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
More information about the Bioconductor
mailing list