[Bioc-devel] Common workflow to build an microarray annatation package, like hgu133a.db

Zhilong Jia zhilongjia at gmail.com
Wed Jan 6 09:48:56 CET 2016


Hi Tim,

I posted the question here is because Maintainers of most of the annotation
pacakage is Bioconductor Package Maintainer, though I posted at the support
website (https://support.bioconductor.org/p/76545/). Probably there should
be a common workflow to handle this kind of problem for the core team of
Bioconductor. Thank you.

Zhilong

On 6 January 2016 at 02:07, Tim Triche, Jr. <tim.triche at gmail.com> wrote:

> I should have phrased this differently:
>
> "Don't create new .db0 packages _just to map symbols or sequences_."
>
> The .db0 infrastructure is marvelous for oligonucleotide arrays designed
> to measure transcription, but in some respects it "suffers" from the BioC
> release cycle.  For example, suppose I have a bunch of hgu133plus2 and
> HuGeneST 1.1 arrays where I find that the probe sequences, when aligned to
> a more recent reference transcriptome than the arrays were designed
> against, actually pick up noncoding RNAs better than the
> (discarded-due-to-mismapping) mRNA targets they were originally designed
> against.  In Du et al (2013,
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702647/) we see one of
> several ways in which this information can be used.
>
> HOWEVER!  With the new mappings of probes to genes/symbols/transcripts, we
> have a bit of a conundrum, especially in situations where RNAseq data is
> also available.  mapToIds() and mapToRanges() certainly helps, although a
> helper function that does the same thing based on lifted transcriptomic
> coordinates might do as well as the latter, and the former sometimes won't
> find the correct IDs (again with the release cycle issues).  So if I map a
> number of symbols to, say, Ensembl build 83 plus some other stuff (for
> example, a number of recently documented non-coding RNAs), it's going to be
> rough going to get things mapped back to where I want them.  And then of
> course it would be nice to normalize everything in a sensible fashion.
>
> My suggestion, due to the final two stings in the tail, would be to look
> into a probedesign (pd) file for oligo, so that a person can use SCAN.UPC
> to compare RNAseq and microarray quantifications of the same transcripts
> across a larger number of samples.  That's just my opinion, but as may be
> obvious from the above excruciating level of detail, along with several
> years as maintainer of .db0 packages for platforms where the .db0
> infrastructure might not have been the best fit, I do think my opinion may
> help others.
>
> Of course, I could always be wrong.  I've been wrong many times before.
> Hopefully by documenting the various ways in which I've tried doing things
> (right and wrong), there can be some benefit to others trying the same.
>
> Best,
>
>
>
> --t
>
> On Tue, Jan 5, 2016 at 5:11 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>>
>> On Jan 5, 2016 7:01 PM, "Tim Triche, Jr." <tim.triche at gmail.com> wrote:
>> >
>> > 1) this is a support.bioconductor.org question
>> > 2) don't use .db0 packages, you will rue the day you did
>>
>> Can you expand on this statement? Right now all of the ChipDb are built
>> using a db0 package, so it's not clear to me why this might be a problem.
>>
>> > best,
>> >
>> > --t
>> >
>> > On Tue, Jan 5, 2016 at 3:53 PM, Zhilong Jia <zhilongjia at gmail.com>
>> wrote:
>> >
>> > > Hello,
>> > >
>> > > Happy new year.
>> > >
>> > > What is the common work-flow to build an microarray annotation
>> package,
>> > > like hgu133a.db.
>> > >
>> > > For some array, there are probe sequences available, then maybe
>> mapping is
>> > > used? While for other situations, how to deal with? If code used by
>> the
>> > > team available, that will be great. Thank you.
>> > >
>> > > The specific goal is to build new platform annotation packages which
>> are
>> > > not available now from Bioconductor (what I need is just probe to gene
>> > > symbols).
>> > >
>> > > It seems Bioconductor update the annotation package when a new version
>> > > releasing due to the update of gene symbols.
>> > >
>> > > BTW, why name it as hgu133a.db instead of GPL96.db (from GEO) in
>> > > Bioconductor? And user have to find the mapping relationship between
>> them,
>> > > though there are some mappings, such as
>> > >
>> https://gist.github.com/seandavi/bc6b1b82dc65c47510c7#file-platformmap-txt
>> > > .
>> > >
>> > >
>> > > Regards,
>> > > Zhilong
>> > >
>> > > --
>> > > Zhilong JIA
>> > > zhilongjia at gmail.com
>> > > https://github.com/zhilongjia
>> > >
>> > >         [[alternative HTML version deleted]]
>> > >
>> > > _______________________________________________
>> > > Bioc-devel at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> > >
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioc-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>


-- 
Zhilong JIA
zhilongjia at gmail.com
https://github.com/zhilongjia

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list