[Bioc-devel] Common workflow to build an microarray annatation package, like hgu133a.db

Wed Jan 6 03:07:35 CET 2016

I should have phrased this differently:

"Don't create new .db0 packages _just to map symbols or sequences_."

The .db0 infrastructure is marvelous for oligonucleotide arrays designed to
measure transcription, but in some respects it "suffers" from the BioC
release cycle.  For example, suppose I have a bunch of hgu133plus2 and
HuGeneST 1.1 arrays where I find that the probe sequences, when aligned to
a more recent reference transcriptome than the arrays were designed
against, actually pick up noncoding RNAs better than the
(discarded-due-to-mismapping) mRNA targets they were originally designed
against.  In Du et al (2013,
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702647/) we see one of several
ways in which this information can be used.

HOWEVER!  With the new mappings of probes to genes/symbols/transcripts, we
have a bit of a conundrum, especially in situations where RNAseq data is
also available.  mapToIds() and mapToRanges() certainly helps, although a
helper function that does the same thing based on lifted transcriptomic
coordinates might do as well as the latter, and the former sometimes won't
find the correct IDs (again with the release cycle issues).  So if I map a
number of symbols to, say, Ensembl build 83 plus some other stuff (for
example, a number of recently documented non-coding RNAs), it's going to be
rough going to get things mapped back to where I want them.  And then of
course it would be nice to normalize everything in a sensible fashion.

My suggestion, due to the final two stings in the tail, would be to look
into a probedesign (pd) file for oligo, so that a person can use SCAN.UPC
to compare RNAseq and microarray quantifications of the same transcripts
across a larger number of samples.  That's just my opinion, but as may be
obvious from the above excruciating level of detail, along with several
years as maintainer of .db0 packages for platforms where the .db0
infrastructure might not have been the best fit, I do think my opinion may
help others.

Of course, I could always be wrong.  I've been wrong many times before.
Hopefully by documenting the various ways in which I've tried doing things
(right and wrong), there can be some benefit to others trying the same.

Best,

--t

On Tue, Jan 5, 2016 at 5:11 PM, James W. MacDonald <jmacdon at uw.edu> wrote:

>
> On Jan 5, 2016 7:01 PM, "Tim Triche, Jr." <tim.triche at gmail.com> wrote:
> >
> > 1) this is a support.bioconductor.org question
> > 2) don't use .db0 packages, you will rue the day you did
>
> Can you expand on this statement? Right now all of the ChipDb are built
> using a db0 package, so it's not clear to me why this might be a problem.
>
> > best,
> >
> > --t
> >
> > On Tue, Jan 5, 2016 at 3:53 PM, Zhilong Jia <zhilongjia at gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > Happy new year.
> > >
> > > What is the common work-flow to build an microarray annotation package,
> > > like hgu133a.db.
> > >
> > > For some array, there are probe sequences available, then maybe
> mapping is
> > > used? While for other situations, how to deal with? If code used by the
> > > team available, that will be great. Thank you.
> > >
> > > The specific goal is to build new platform annotation packages which
> are
> > > not available now from Bioconductor (what I need is just probe to gene
> > > symbols).
> > >
> > > It seems Bioconductor update the annotation package when a new version
> > > releasing due to the update of gene symbols.
> > >
> > > BTW, why name it as hgu133a.db instead of GPL96.db (from GEO) in
> > > Bioconductor? And user have to find the mapping relationship between
> them,
> > > though there are some mappings, such as
> > >
> https://gist.github.com/seandavi/bc6b1b82dc65c47510c7#file-platformmap-txt
> > > .
> > >
> > >
> > > Regards,
> > > Zhilong
> > >
> > > --
> > > Zhilong JIA
> > > zhilongjia at gmail.com
> > > https://github.com/zhilongjia
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]