[Bioc-devel] Extending annotation packages

Wed Jul 18 22:10:30 CEST 2007

On Jul 18, 2007, at 12:27 PM, Sean Davis wrote:

> Seth Falcon wrote:
>> Hi Sean,
>>
>> Sean Davis <sdavis2 at mail.nih.gov> writes:
>>
>>> I have built an annotation package, but I would like to add a  
>>> couple of
>>> more annotation sources (which I will build by hand).  Is there an
>>> accepted way of doing this if the ultimate goal is distribution?  In
>>> particular, I would like to add a mapping to higher-resolution
>>> chromosome location information and another mapping to a boolean
>>> flag.
>>>
>>
>> I don't think we have a recommended procedure.  A few ideas:
>>
>> 1. You can contribute the annotation data package to BioC and
>>    distribute it there if you like.  In this case, you will be
>>    expected to update the release version prior to each BioC release
>>    and to build the package against the same annotation source data
>>    download that we use for the other packages -- this way things  
>> like
>>    GO will be in sync across packages.  Marc Carlson is the contact
>>    person for this (he is a new member of our group in Seattle;
>>    Nianhua is no longer in the group, but still involved in BioC on a
>>    volunteer basis).
>>
>>
>
> That would be the plan, yes.
>
>> 2. Is the higher-resolution chromosome location information something
>>    that could be applied to many existing annotation data packages or
>>    just yours?  We hope to have some discussion at the Developer Day
>>    at BioC2007 about future directions for the annotation data
>>    packages with a focus on what newly available data should be
>>    included in future releases of the packages.
>>
>>
>
> In addition to locations of genes on the chromosomes, I would like to
> include information about the probe locations themselves, since for  
> the
> platform that I am using, these data are critical.

(I am assuming that by probe location, Sean means where on the genome  
the probe hits).

This is an interesting idea which is certainly applicable to most  
microarrays with multiple probes per "gene" (or transcript or unit or  
whatever), including Affy arrays. It has the flavour of the remapping  
done by MCBI for the affy chips. It also has the flavour of being  
essentially equal to the basic mapping done for a tiling array (probe  
to genome).

The current BioC annotation strategy is to have (for Affy chips which  
I am most familiar with)

probe to "gene" mapping : CDF environment
"gene" to annotation like GO etc: annotation packages
probe level info: probe package - but currently a probe package is  
essentially completely independent of any annotation including a genome.

I think that the information Sean wants to include would be useful  
for all chips and I think that eg. the MCBI packages are evidence for  
that. But I am not sure that the best way to include this information  
is to extend the annotation packages, but perhaps rather the probe  
packages. This would imply that the probe packages are bundled with a  
genome version - but since genomes usually change rather slowly this  
might not be a big problem. Including it in the probe packages would  
also mean a redesign since some probes might hit multiple locations,  
so the information could not just be stored in the usual data.frame.  
Upgrading to the new SQLite based packages probably makes this much  
simpler. (so here I am essentially advocating for a routine blasting  
of the probes to the genome).

All in all I think this is something the community should think about  
doing. But since Sean has a use case and perhaps a very special chip  
I would suggest to just "go ahead and do it" and see what the results  
are - we might learn from it.

Kasper

>
>> 3. Don't forget to add documentation for the objects you add to the
>>    package :-)
>>
>
> But, of course.  ; )
>
> As usual, thanks, Seth.
>
> Sean
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel