Hi Pan,

Thank *you* for your thoughtful replies.  I'd forgotten how control probe
data is stored in LumiBatch objects.  That seems like the most consistent
way to handle it for MethyLumiM objects, now that you mention it; using
addControlData2lumi(), but with both channels represented, such as in a list
with $Red and $Grn data.frames.  Using getControlData with a "MethyLumiSet"
type would do the trick; I can easily write this, in fact.  I'll send a
patch.

On Sat, Dec 18, 2010 at 7:23 AM, Pan Du <dupan@northwestern.edu> wrote:
>
> As you know, the benefit of nuID is that we can directly know the probe
> sequence without checking any table. But Illumina Probe ID, as the
> manufacturer ID, is the most widely used in public. So I think one
> alternative way is just to add an additional Bimap table of IlluminaID and
> nuID in the current Infinium methylation library. As an option, I will add
> a
> mapping function to convert data between Illumina ID and nuID. But by
> default, data will be IlluminaID identified.
>

This is a good idea.  You're right, it will save a good deal of storage
space, and using the readBPM() function to parse the updated Illumina
manifests should also ease maintenance.


> As for multiple mappings, I am not sure how Illumina 450k reports them. For
> easier maintenance in the long run, we can just keep the same way as
> Illumina do. Illumina has improved their annotation maintenance. They make
> regular updates of their annotations now.
>

The most recent manifest is available via iCom, but partly because Sean had
to do all the heavy lifting last time around, I'm planning to push out at
least a SQLite package of probe NuID/channel/chemistry annotations as
illuminaHumanMethylation450kProbes.db.  In the Illumina annotations, the
accession numbers are concatenated with semicolons, with as many as 6
separate accession numbers provided per probe.  I don't think anyone had
this scenario in mind when the AnnotationDbi package and its mappings were
designed :-)

Can you send me some example control probe data? One option is keeping the
> same way as LumiBatch-class to store control data information.
>

Will do.  I've been working on a package so that Infinium methylation chips
can be handled the way expression or SNP chips are (in 'beadarray'/'lumi' or
in 'crlmm').   I plan to put up a vignette showing a mixture experiment on
the 450k arrays and the 27k arrays, comparing the various preprocessing
options and their effects on each platform, but if there are no objections
from the investigators, I'll see if I can't just post the control probe data
this week.

This sounds good. The probe sequences can be kept as nuID format to save
> storage space. But again, long term maintenance is a issue.
>

Agreed, and good idea to use NuIDs to encode the sequences.  Perhaps using a
'U' at the interrogated CpG sites would be the best way to encode probe
pairs for NuID conversion.


> If you can send me some example control data, I can play with it and update
> the MethyLumiM class at the end of this year. If possible, please also send
> me one or two samples of 450K data with annotation information.
>

I'll see if I can't get it turned around this week.  A typical 450k array is
(as you might imagine) rather larger than the corresponding 27k array (~15MB
vs ~1.5MB of IDAT files) but the controls are just a few K (and no mapping
or normalization is required for them).  So I don't imagine it would be much
trouble to post some examples on GitHub.


> Happy holidays!
>

And also to you and yours,

--t

	[[alternative HTML version deleted]]