Speaking of annotations...

The 450k methylation chips have arrived and many of the probes map to
multiple accessions.  I have modified the 27k annotations from Sean to
handle bead mapping IDs (thanks Sean!) and started building probe sequence
packages in the style of matchprobes/beadarray, but I was wondering how
people would like to handle multiple-accession mappings and/or NuID encoding
of the 450k probes.  As I'm sure many of you are aware, the denser platform
contains many more probes with multiple CpG sites, which affects both
preprocessing and the manner in which it makes sense to apply NuID
translations. There are other 'interesting' aspects of the 450k arrays,
along with consequences arising from the design of the platform, but a good
first step would be to get the data playing nicely with methylumi and lumi,
hence some agreement upon the annotations.

Also, given the large number of controls on the platform, a means of keeping
track of (e.g.) bisulfite conversion controls and non-polymorphic probes, as
well as the 600 or so negative controls, belongs in MethyLumiM.  How this
should best be accomplished is less clear to me -- it's trivial to pull the
control probe intensities from the .IDAT files that are always emitted from
every scan of a 27k or 450k chip, but the representation within the object
is more troublesome.  One of the things that made combining and subsetting
MethyLumiSet objects quirky on occasion was the eSet-within-an-eSet
representation for control probes, so I understand your (plural) reluctance
to continue that model.  However, the only other thing I can come up with is
to use an additional couple of data.frame() objects to hold the Cy5 and Cy3
control intensities.  That would be fine too.

This raises the question of how to store information about the control (and
analytic!) probes.  Two package types make sense:

1) IlluminaHumanMethylationXXk.db -- probe annotations like Sean has built
for the 27k and GG arrays (small and compact)
2) illuminaHumanMethylationXXkProbe.db -- addresses and probe sequences for
the analytic and control probes (larger .db)

Thoughts?  I mostly work from .IDAT files these days, and the Cambridge guys
have expressed interest in making bead-level tools play nicely with
methylumi/lumi.  Regardless of where the data comes from, it would be nice
to standardize on representations, and leave room for using low-level data
as appropriate.  (I've seen referees ask paper authors whether they looked
at bisulfite control probes, for example -- not an unreasonable question,
but the current MethyLumiM can't answer it.)

I've been impressed with the cleanliness of the Lumi methylation codebase,
and it would be great if a consensus arose about data representations and
annotation, so that this cleanliness is not disrupted.  For example, I have
built 27k and 450k probe packages, and the 450k annotation package can be
built without much trouble if a consensus can be reached as far as mapping
multiple accessions to probes (or perhaps using a GRanges object or
GenomicFeatures?!?).   Patching the current  MethyLumiM object to handle
subsetting of a couple additional data.frames is no problem either.  But I'd
like to get your thoughts on the matter before I go off and hack up your
code.  :-)


--t


On Fri, Dec 17, 2010 at 7:56 AM, Gang Feng <g-feng@northwestern.edu> wrote:

> Hi, Antti
>
> example.lumi is an object of lumi.batch. For annotation, you can use
> lumiHumanAll.db
> to get the mapping.
>
> Best
>
> Gilbert
>
>
>
> > Hi all,
> >
> > I am developing a package with intention to use data from several
> > microarray platforms and the related annotations in a portable manner.
> >
> > Given an ExpressionSet "eset", I have been using constructs like
> >  > library(annotate)
> >  > m <- getAnnMap('SYMBOL', annotation(eset))
> >  > s <- get(featureNames(eset)[1], m)
> > which seems portable and works fine on Affymetrix data I have used so
> > far.
> >
> > Turning to Illumina and lumi package the same does not work anymore:
> > ------------------------------------------------------------
> >  > library(lumi)
> >  > data(example.lumi)
> >  > m <- getAnnMap('SYMBOL', annotation(example.lumi))
> > Error: getAnnMap: package lumiHumanV1 not available
> >  > biocLite('lumiHumanV1.db')
> > Using R version 2.12.0, biocinstall version 2.7.4.
> > Installing Bioconductor version 2.7 packages:
> > [1] "lumiHumanV1.db"
> > Please wait...
> >
> > Warning message:
> > In getDependencies(pkgs, dependencies, available, lib) :
> >    package ‘lumiHumanV1.db’ is not available
> > ------------------------------------------------------------
> >
> > Is this just a bug in the example.lumi object, or is it simply wrong
> > to assume that the same mechanism should work with lumi at all?
> >
> >
> > Antti
> >
> > --
> > Antti Honkela
> > Antti.Honkela@tkk.fi   -   http://users.ics.tkk.fi/ahonkela/
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
> --
> -----------------------------------------------------------------
> Gang (Gilbert) Feng, PhD
> Biomedical Informatics Center
> Robert H. Lurie Comprehensive Cancer Center
> Northwestern University
> 750 N. Lake Shore Drive, 11th Floor(11-175e)
> Chicago, IL  60611
> Phone:312-503-2358
> Email g-feng@northwestern.edu
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
If people do not believe that mathematics is simple, it is only because they
do not realize how complicated life is.
John von Neumann<http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html>

	[[alternative HTML version deleted]]