[Bioc-sig-seq] Finding Mean Value of Overlapping Ranges

Dario Strbenac D.Strbenac at garvan.org.au
Fri Jun 25 10:05:42 CEST 2010


That's a neat and elegant idea, but it's not actually possible to do the following part

as(qrle, "IRanges")

Error in asMethod(object) : 
  cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object

Thanks,
       Dario.


---- Original message ----
>Date: Thu, 24 Jun 2010 23:53:08 -0700
>From: Michael Lawrence <lawrence.michael at gene.com>  
>Subject: Re: [Bioc-sig-seq] Finding Mean Value of Overlapping Ranges  
>To: D.Strbenac at garvan.org.au
>Cc: bioc-sig-sequencing at r-project.org
>
>   On Thu, Jun 24, 2010 at 10:31 PM, Dario Strbenac
>   <D.Strbenac at garvan.org.au> wrote:
>
>     Hello,
>
>     I have a question about what is the most efficient
>     way to perform my use case.
>
>     What I have done is gotten a matchMatrix from an
>     overlapping, then split it :
>
>     regionSiteMap <- findOverlaps(regions,
>     sites)@matchMatrix
>     indexList <- split(regionSiteMap[, "subject"],
>     regionSiteMap[, "query"])
>
>   Instead of splitting, get the scores and query hits
>   into an Rle:
>
>   ol <- findOverlaps(regions, sites)
>   srle <- Rle(scoreVec[subjectHits(ol)])
>   qrle <- Rle(queryHits(ol))
>
>   The Rle compression may not be appropriate for your
>   scores, but now you can use the query Rle to define
>   Views on the score Rle:
>
>   v <- Views(srle, as(qrle, "IRanges"))
>
>   Now all the view methods are at your disposal, like
>   viewMeans():
>
>   means <- viewMeans(v)
>
>   Michael
>    
>
>     Now I'd like to, for each region, use the indices
>     to the sites to get the sites' scores from a
>     vector and take the mean, like :
>
>     means <- sapply(indicesList, function(indices)
>     mean(scoreVect[indices]))
>
>     The problem about this is that I have ~ 8 million
>     'regions', and ~ 28 million 'sites'. So the
>     indexList is a list of ~ 8 million elements with a
>     few indices in each one, and scoresVect is a
>     numeric vector of scores of length ~ 28 million.
>
>     Can anyone suggest what is the fastest way to go
>     on this task ?
>
>     --------------------------------------
>     Dario Strbenac
>     Research Assistant
>     Cancer Epigenetics
>     Garvan Institute of Medical Research
>     Darlinghurst NSW 2010
>     Australia
>
>     _______________________________________________
>     Bioc-sig-sequencing mailing list
>     Bioc-sig-sequencing at r-project.org
>     https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia



More information about the Bioc-sig-sequencing mailing list