[Bioc-sig-seq] `+` for GenomeData and coverage from several lanes

Patrick Aboyoun paboyoun at fhcrc.org
Tue Jun 30 18:05:50 CEST 2009


Simon,
Could you provide some profiling information to show where the 
bottlenecks are? I am also wondering if I should be building up the 
functionality for RleList, which could have `+` and other Math 
operations. We have a lot of classes in the Sequence space and it is not 
clear yet which classes are going to be part of the winning solution.



Patrick



Simon Anders wrote:
> Hi Michael et al.
>
> Assume I have a list of AlignedRead objects, with data from several 
> Solexa lanes. I would like to get a coverage vector over all the lanes.
>
> As 'coverage' takes only one AlignedRead object, I have two 
> possibilities:
>
> (a) Concatenate the AlignedRead objects to a single big one. As far as 
> I can see, 'rbind2' is not implemented for AlignedRead, and 'append' 
> seemed very slow to me. It is probably not posible to do this without 
> making a copy of all the data in memory.
>
> (b) Calculate the coverage for each AlignedRead object separately and 
> add up the GenomeData objects. The `+` operator is not supported for 
> GenomeData objects but it is for Rle objects. So I need to loop 
> through the chromosomes.
>
> I've now written this short function for the purpose:
>
> sumUpCoverage <- function( lanes, seqLens )
> {
>    res <- NULL
>    for( i in 1:length(lanes) ) {
>       cvg <- coverage( lanes, width=seqLens )
>       if( is.null( res ) )
>          res <- cvg
>       else {
>          stopifnot( all( names(res) == names(cvg) ) )
>          for( seq in names(res) )
>             res[[seq]] <- res[[seq]] + cvg[[seq]]
>       }
>    }
>    res
> }
>
> This does not seem very elegant (and it takes 9 minutes, which, 
> however, might be ok for 29 mio reads). Any idea how to do it better? 
> (The use of 'for' instead of 'sapply' is on purpose: I hope to save 
> memory that way.)
>
> And would it make sense to overload the `+` operator for GenomeData 
> objects? If so, could I suggest adding this to BSgenome?
>
> Cheers
>   Simon
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list