[Bioc-sig-seq] `+` for GenomeData and coverage from several lanes

Simon Anders anders at ebi.ac.uk
Tue Jun 30 19:50:02 CEST 2009


Hi

Patrick Aboyoun wrote:
> Simon,
> Could you provide some profiling information to show where the 
> bottlenecks are? 

I don't know if there is really a clear bottleneck. 9 minutes to 
calculate the coverage of 29 mio reads is 20 seconds per mio reads; this 
is probably what the coverage function always needed. So, in the code 
given in my mail, the summing up of the GenomeData objects is just 
awkward but not a performance penalty.

 > I am also wondering if I should be building up the
 > functionality for RleList, which could have `+` and other Math
 > operations. We have a lot of classes in the Sequence space and it is
 > not clear yet which classes are going to be part of the winning
 > solution.

I'd say that this is the main issue. I discover new classes every day. 
You just mentioned 'RleList', Michael mentions 'GenomeDataList', and 
Martin has another way to go again.

I'm sorry to say that, at least for me, this has become hopelessly 
confusing, and I imagine that many other users fell the same. You write 
that "it is not clear yet which classes are going to be part of the 
winning solution" and I completely agree that it makes more sense to 
have a few good classes rather than adding functionality to any class on 
demand. So, maybe don't bother with a `+` operation for now.

Best regards
   Simon



More information about the Bioc-sig-sequencing mailing list