[Bioc-devel] FeatureDb could be generalized
Marc Carlson
mcarlson at fhcrc.org
Thu May 12 19:35:32 CEST 2011
Hi Michael,
That is an interesting idea. I like the idea of having more data be
available via FeatureDb, and I especially like the idea of having useful
transformations of the data it provides. But I am a little confused
about one part of what you are suggesting. Is there a reason why we
would want to add a bunch of stuff to basically re-implement what the
database does instead of just writing some simple methods to allow the
import of these other kinds to files?
I can think of advantages to keeping the data container type consistent
(providing that it is not proving burdensome), since SQL allows joins to
be made across databases thus allowing a collection of data that has all
been stored in this way to be easily linked together as needed. But
what is the advantage of making a bunch of classes and methods that will
allow us to pretend that our bam and vcf files are actually databases?
Also what would the purpose of a SequenceDb object be? The name is
generic enough that I am unable to guess what you have in mind.
Marc
On 05/12/2011 06:08 AM, Michael Lawrence wrote:
> Hi guys,
>
> I was just looking at the FeatureDb class in GenomicFeatures. I'm wondering
> if we couldn't abstract that from its SQLite implementation. There are many
> other sources of features, e.g., files like BAM, VCF and even BED. If these
> are indexed properly, we could make fast queries against them. So what we
> really need is a class, named something like FeatureDb, that returns, for a
> given 'which' (as a bare minimum), a GRanges.
>
> I could also imagine having proxy FeatureDb objects that transform the data
> on the way. Like a FeatureDb that will return the coverage, using another
> FeatureDb as a source. Caching could be implemented as part of the base
> class. I'm also wondering whether these should be reference classes. Then if
> some "parent" FeatureDb is modified, the down-stream objects can be informed
> of the change.
>
> And a SequenceDb would be nice, too.
>
> I'll write up a prototype in the MutableRanges package (in the bioc repo),
> but I'll call it RangeDb to avoid conflicts for now.
>
> Michael
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list