[Bioc-devel] Extending annotation packages
Kasper Daniel Hansen
khansen at stat.Berkeley.EDU
Wed Jul 18 22:10:30 CEST 2007
On Jul 18, 2007, at 12:27 PM, Sean Davis wrote:
> Seth Falcon wrote:
>> Hi Sean,
>>
>> Sean Davis <sdavis2 at mail.nih.gov> writes:
>>
>>> I have built an annotation package, but I would like to add a
>>> couple of
>>> more annotation sources (which I will build by hand). Is there an
>>> accepted way of doing this if the ultimate goal is distribution? In
>>> particular, I would like to add a mapping to higher-resolution
>>> chromosome location information and another mapping to a boolean
>>> flag.
>>>
>>
>> I don't think we have a recommended procedure. A few ideas:
>>
>> 1. You can contribute the annotation data package to BioC and
>> distribute it there if you like. In this case, you will be
>> expected to update the release version prior to each BioC release
>> and to build the package against the same annotation source data
>> download that we use for the other packages -- this way things
>> like
>> GO will be in sync across packages. Marc Carlson is the contact
>> person for this (he is a new member of our group in Seattle;
>> Nianhua is no longer in the group, but still involved in BioC on a
>> volunteer basis).
>>
>>
>
> That would be the plan, yes.
>
>> 2. Is the higher-resolution chromosome location information something
>> that could be applied to many existing annotation data packages or
>> just yours? We hope to have some discussion at the Developer Day
>> at BioC2007 about future directions for the annotation data
>> packages with a focus on what newly available data should be
>> included in future releases of the packages.
>>
>>
>
> In addition to locations of genes on the chromosomes, I would like to
> include information about the probe locations themselves, since for
> the
> platform that I am using, these data are critical.
(I am assuming that by probe location, Sean means where on the genome
the probe hits).
This is an interesting idea which is certainly applicable to most
microarrays with multiple probes per "gene" (or transcript or unit or
whatever), including Affy arrays. It has the flavour of the remapping
done by MCBI for the affy chips. It also has the flavour of being
essentially equal to the basic mapping done for a tiling array (probe
to genome).
The current BioC annotation strategy is to have (for Affy chips which
I am most familiar with)
probe to "gene" mapping : CDF environment
"gene" to annotation like GO etc: annotation packages
probe level info: probe package - but currently a probe package is
essentially completely independent of any annotation including a genome.
I think that the information Sean wants to include would be useful
for all chips and I think that eg. the MCBI packages are evidence for
that. But I am not sure that the best way to include this information
is to extend the annotation packages, but perhaps rather the probe
packages. This would imply that the probe packages are bundled with a
genome version - but since genomes usually change rather slowly this
might not be a big problem. Including it in the probe packages would
also mean a redesign since some probes might hit multiple locations,
so the information could not just be stored in the usual data.frame.
Upgrading to the new SQLite based packages probably makes this much
simpler. (so here I am essentially advocating for a routine blasting
of the probes to the genome).
All in all I think this is something the community should think about
doing. But since Sean has a use case and perhaps a very special chip
I would suggest to just "go ahead and do it" and see what the results
are - we might learn from it.
Kasper
>
>> 3. Don't forget to add documentation for the objects you add to the
>> package :-)
>>
>
> But, of course. ; )
>
> As usual, thanks, Seth.
>
> Sean
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list