[Bioc-devel] FeatureDb could be generalized

Marc Carlson mcarlson at fhcrc.org
Thu May 12 19:35:32 CEST 2011


Hi Michael,

That is an interesting idea.  I like the idea of having more data be 
available via FeatureDb, and I especially like the idea of having useful 
transformations of the data it provides.  But I am a little confused 
about one part of what you are suggesting.  Is there a reason why we 
would want to add a bunch of stuff to basically re-implement what the 
database does instead of just writing some simple methods to allow the 
import of these other kinds to files?

I can think of advantages to keeping the data container type consistent 
(providing that it is not proving burdensome), since SQL allows joins to 
be made across databases thus allowing a collection of data that has all 
been stored in this way to be easily linked together as needed.  But 
what is the advantage of making a bunch of classes and methods that will 
allow us to pretend that our bam and vcf files are actually databases?

Also what would the purpose of a SequenceDb object be?  The name is 
generic enough that I am unable to guess what you have in mind.


   Marc



On 05/12/2011 06:08 AM, Michael Lawrence wrote:
> Hi guys,
>
> I was just looking at the FeatureDb class in GenomicFeatures. I'm wondering
> if we couldn't abstract that from its SQLite implementation. There are many
> other sources of features, e.g., files like BAM, VCF and even BED. If these
> are indexed properly, we could make fast queries against them. So what we
> really need is a class, named something like FeatureDb, that returns, for a
> given 'which' (as a bare minimum), a GRanges.
>
> I could also imagine having proxy FeatureDb objects that transform the data
> on the way. Like a FeatureDb that will return the coverage, using another
> FeatureDb as a source. Caching could be implemented as part of the base
> class. I'm also wondering whether these should be reference classes. Then if
> some "parent" FeatureDb is modified, the down-stream objects can be informed
> of the change.
>
> And a SequenceDb would be nice, too.
>
> I'll write up a prototype in the MutableRanges package (in the bioc repo),
> but I'll call it RangeDb to avoid conflicts for now.
>
> Michael
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list