[BioC] illumina annotation packages vs. bgx manifests
lamon at fhcrc.org
Tue Jul 22 20:27:21 CEST 2008
Okay, so first, use the more recent illuminaRatv1ProbeID.db package
instead of illuminaRatBCv1. Secondly, I only included probes which
mapped to RefSeq identifiers rather than those which could only be
mapped to GenBank. So, if you found something in the .bgx file which
maps to a gene symbol but in the "Accession" column the identifier is
GenBank, you won't find the gene symbol in illuminaRatv1ProbeID.db. You
can check this quickly by looking at illuminaRatv1ProbeIDACCNUM.
If folks would like GenBank identifiers to be included, we could change
the package but I would like a way to distinguish probes that can only
be mapped to a GenBank nucleotides other than ACCNUM which nobody seems
to ever look at.
PS: For those of you following the increasing number of emails
regarding the illumina annotation packages, Rat v1 and Mouse v2 are
based on Illumina-provided accession identifiers whereas Human v1, v2,
v3 and Mouse v1 and v1.1 are all based on BLAST results found at
http://www.compbio.group.cam.ac.uk/Resources/Annotation. The former
were not available at the time the packages were made.
Michal Kolář wrote:
> Hi Lynn,
> thank you for the fast answer.
> I was using the bgx manifest in R until recently. However, I realised
> that many packages ask for an eSet in which the annotation is built
> in, in a form of an annotation package (GSEABase for example). Then I
> decided to change to some annotation package and I am trying to find
> out what is the best package.
> I completely agree with you that your illuminaRatv1ProbeID.db is the
> best choice (I actually used the package downloaded from your web
> page: illuminaRatBCv1). But when trying to colour the KEGG pathways I
> realised that the mapping from the kegg pathways to probe ids is not
> perfect. I found in several pathways that a well characterised enzyme
> in KEGG is not represented by its gene's Array_Address_Id in the list
> returned by illuminaRatBCv1PATH2PROBE. When I tried to use
> org.Rn.egPATH2EG I found the gene. And was able to map it to
> Array_Address_Id using the manifest file.
> That was the moment when I started to look around for other packages
> and see, if the problem is restricted to illuminaRatBCv1 or is a
> generic one. And that is the reason I am looking for the mapping
> between probe identifiers in those packages and the Array_Address_ID.
> Maybe I just used a dated package, . . .
> On 22 Jul 2008, at 18:41, Lynn Amon wrote:
>> Hi Michal,
>> I'm not completely certain what you are asking. The Array_Address_ID
>> is the probe identifier used in the illuminaRatv1ProbeID.db
>> annotation package so you should link using that identifier. If you
>> are going to use the .bgx file for annotation, you don't really need
>> an annotation package at all. Lynn
>> Michal Kolář wrote:
>>> Dear List,
>>> I wonder what is the correct probe identifier for illumina
>>> annotation packages.
>>> I have the illumina raw data (tiff) where the beads are identified
>>> by their corresponding Array_Address_ID, and then I have the
>>> illumina manifest file (.bgx). I use the illuminaRatv1 annotation
>>> package. And my question is, how can I map Array_Address_IDs to the
>>> identifiers of the annotation package.
>>> I read in several postings in the List, that these identifiers
>>> should be the TargetIDs in the manuscript. But there is no TargetID
>>> in the rat manuscript (.bgx). There are however two other
>>> identifiers that look similar to the identifiers of the annotation
>>> package. One of them is Probe_ID, but there is no overlap between
>>> the two IDs sets. The other is called Transcript and that one looks
>>> better, but still only one third of the identifiers matches. So what
>>> is the correct column in the manifest to link against? (If any.)
>>> I know I can use the illuminaRatv1ProbeID.db package to link
>>> directly against Array_Address_ID or lumiRatV1 to link against probe
>>> sequences, but I want to compare the packages and to see possible
>>> Michal Kolář
>>> Academy of Sciences of the Czech Republic
>>> Institute of Molecular Genetics
>>> Vídeňská 1083
>>> CZ-14220 Praha
>>> Czech Republic
>>> phone: +420 296 443 412
>>> email: kolarmi at img.cas.cz
>>> www: http://www.thp.uni-koeln.de/~kolarmi/research
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> Search the archives:
More information about the Bioconductor