[Bioc-devel] Illumina Methylation annotations

Mon Dec 20 23:15:05 CET 2010

Hi Tim

Thanks for your reply! Following is some thoughts of mine.

> Thank *you* for your thoughtful replies.  I'd forgotten how control probe
> data is stored in LumiBatch objects.  That seems like the most consistent
> way to handle it for MethyLumiM objects, now that you mention it; using
> addControlData2lumi(), but with both channels represented, such as in a list
> with $Red and $Grn data.frames.  Using getControlData with a "MethyLumiSet"
> type would do the trick; I can easily write this, in fact.  I'll send a
> patch.

Considering there are $Red and $Grn channels of the control data, I may use
AssayData-class instead of simple data.frame to keep the control data.
Anyway, I need to see how the real control data looks like. Our Genomomic
core only provides only the summary information of the control data.

> 
>> As for multiple mappings, I am not sure how Illumina 450k reports them. For
>> easier maintenance in the long run, we can just keep the same way as
>> Illumina do. Illumina has improved their annotation maintenance. They make
>> regular updates of their annotations now.
>> 
> 
> The most recent manifest is available via iCom, but partly because Sean had
> to do all the heavy lifting last time around, I'm planning to push out at
> least a SQLite package of probe NuID/channel/chemistry annotations as
> illuminaHumanMethylation450kProbes.db.  In the Illumina annotations, the
> accession numbers are concatenated with semicolons, with as many as 6
> separate accession numbers provided per probe.  I don't think anyone had
> this scenario in mind when the AnnotationDbi package and its mappings were
> designed :-)

I've downloaded the manifest file of 450K Infinium chip. It does have lots
of multiple mappings from probe to genes. I remember the current
AnnotationDbi package can handle multiple mappings from probe to genes. But
multiple mappings will make the following up analysis, like GO analysis,
more challenging.

> Will do.  I've been working on a package so that Infinium methylation chips
> can be handled the way expression or SNP chips are (in 'beadarray'/'lumi' or
> in 'crlmm').   I plan to put up a vignette showing a mixture experiment on
> the 450k arrays and the 27k arrays, comparing the various preprocessing
> options and their effects on each platform, but if there are no objections
> from the investigators, I'll see if I can't just post the control probe data
> this week.

What package are you developing? I cannot find any similar one on
Bioconductor developing website.

>> If you can send me some example control data, I can play with it and update
>> the MethyLumiM class at the end of this year. If possible, please also send
>> me one or two samples of 450K data with annotation information.
>> 
> 
> I'll see if I can't get it turned around this week.  A typical 450k array is
> (as you might imagine) rather larger than the corresponding 27k array (~15MB
> vs ~1.5MB of IDAT files) but the controls are just a few K (and no mapping
> or normalization is required for them).  So I don't imagine it would be much
> trouble to post some examples on GitHub.
> 
> 
Just send me the control data is fine if the entire data file is too big.

Thanks!

Pan