[BioC] 450K annotation: discrepancy between GEO GPL and Bioconductor annotation

Fri Oct 19 19:14:44 CEST 2012

Hi Tom,

Tim is right about using bimaps.  Bimaps were invented to mimic the 
behavior of R environments that were originally aimed at supporting 
expression arrays.

If you really insist on using the bimaps, you could use the 
toggleProbes() method he described to "unhide" your mappings.  This 
method was added to help with situations like this one (where people 
really wanted to use probes that were mapping to multiple IDs).

Or (and I think this is probably an even better option for you) you 
could just use the new select interface to extract these things.  Select 
doesn't have to play these games since the legacy code that expected the 
more restrictive behavior was written before we implemented select.  
This freed us to do things a bit more universally in it's 
implementation.  You can learn more about the new select interface here:

http://www.bioconductor.org/packages/2.11/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf

Hope this helps,

   Marc

On 05/16/2012 03:59 PM, Tim Triche, Jr. wrote:
> toggleProbes() masks values where a probe is annotated to multiple
> transcripts as 'NONE' or 'NA' by default.  Unfortunately, many (thousands)
> of the 450k probes are mapped to multiple transcripts in the manifest, and
> by default, the automatically generated bimap objects will treat them as if
> they were (degenerate) expression probes, masking them.
>
> I am attempting to address this by replacing the 450k.db, 27k.db, and
> 450kprobe packages with a faster, smaller, FeatureDb-based omnibus package
> that keeps track of the minimal information required to mask probes,
> annotate regions of interest, and process IDAT files, with all other
> operations (distance to TSS, chromosome, GC%, etc.) delegated to
> GenomicRanges and GenomicFeatures.  In my experience this makes much more
> sense than using a framework that was originally created for expression
> probes.  I didn't realize the difference when I first packaged the
> annotations into a SQLite database, which is why the 450k.db package uses
> the db0 machinery.
>
> Apologies for the confusion; hopefully this will be a memory as soon as I
> am up to speed on creating FeatureDb objects.
>
>
> --t
>
> On Wed, May 16, 2012 at 12:04 PM, Bartlett, Thomas<
> thomas.bartlett.10 at ucl.ac.uk>  wrote:
>
>> Hi,
>>
>> I've noticed a discrepancy between the chromosome information given for
>> some of the probes of the Illumina Infinium 450K array in the GEO GPL info,
>> and in the corresponding Bioconductor annotation package.
>>
>> The first four probes on the 450K GPL summary page on the GEO website
>> http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534
>> in the 'data table' are cg00035864, cg00050873, cg00061679 and cg00063477,
>> and the corresponding value in the CHR column is Y for all four of these.
>> However, in the corresponding Bioconductor annotation package
>> IlluminaHumanMethylation450k.db, using IlluminaHumanMethylation450kCHR the
>> chromosome for these same 4 probes is given as Y, NONE, NONE and Y,
>> respectively. N.B., the values in the MAPINFO column of 'data table' and
>> those found using IlluminaHumanMethylation450kCPGCOORDINATE are identical
>> for these 4 probes.
>>
>> Is there any reason why there is this discrepancy, and might it be more
>> widespread?
>>
>> Thanks in advance for your help
>>
>> Tom Bartlett
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>