On Thu, May 12, 2011 at 10:35 AM, Marc Carlson <mcarlson@fhcrc.org> wrote:

> Hi Michael,
>
> That is an interesting idea.  I like the idea of having more data be
> available via FeatureDb, and I especially like the idea of having useful
> transformations of the data it provides.  But I am a little confused about
> one part of what you are suggesting.  Is there a reason why we would want to
> add a bunch of stuff to basically re-implement what the database does
> instead of just writing some simple methods to allow the import of these
> other kinds to files?
>
> I can think of advantages to keeping the data container type consistent
> (providing that it is not proving burdensome), since SQL allows joins to be
> made across databases thus allowing a collection of data that has all been
> stored in this way to be easily linked together as needed.  But what is the
> advantage of making a bunch of classes and methods that will allow us to
> pretend that our bam and vcf files are actually databases?
>
>
I don't want to pretend that they are first-class databases. It would just
be nice to have a common interface around these various data sources. The
simplest interface would allow range-based (GRanges) queries and return a
GRanges as the result. Each data source will have its own specific
parameters, but those could be fields of the particular Db class and held
constant across multiple range queries.

The driving use-case for this is visualization. The user is looking at a
particular region of a track. That track has an underlying data source,
often on disk. We just need to query for the features in that region.


> Also what would the purpose of a SequenceDb object be?  The name is generic
> enough that I am unable to guess what you have in mind.
>
>
A data source for sequences. Could be implemented with a BSgenome object, or
FA indexed file, etc.


>  Marc
>
>
>
>
> On 05/12/2011 06:08 AM, Michael Lawrence wrote:
>
>> Hi guys,
>>
>> I was just looking at the FeatureDb class in GenomicFeatures. I'm
>> wondering
>> if we couldn't abstract that from its SQLite implementation. There are
>> many
>> other sources of features, e.g., files like BAM, VCF and even BED. If
>> these
>> are indexed properly, we could make fast queries against them. So what we
>> really need is a class, named something like FeatureDb, that returns, for
>> a
>> given 'which' (as a bare minimum), a GRanges.
>>
>> I could also imagine having proxy FeatureDb objects that transform the
>> data
>> on the way. Like a FeatureDb that will return the coverage, using another
>> FeatureDb as a source. Caching could be implemented as part of the base
>> class. I'm also wondering whether these should be reference classes. Then
>> if
>> some "parent" FeatureDb is modified, the down-stream objects can be
>> informed
>> of the change.
>>
>> And a SequenceDb would be nice, too.
>>
>> I'll write up a prototype in the MutableRanges package (in the bioc repo),
>> but I'll call it RangeDb to avoid conflicts for now.
>>
>> Michael
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]

