[R] vectorization with subset?

David Winsemius dwinsemius at comcast.net
Mon Jul 2 20:24:11 CEST 2012


On Jul 2, 2012, at 12:15 PM, dlv04c wrote:

> Hello,
>
> I have a data frame (68,000 rows) of scores (V4) for a series of  
> [genomic]
> coordinates ranges (V2 to V3).
>
>
>
> I also have a data frame (1.2 million rows) of single [genomic]  
> coordinates.
>
>
>
> For each genomic coordinate (in coord), I would like to determine the
> average of all scores whose genomic ranges (in scores) encompass the
> coordinate (in coord). To accomplish this, I tried:
>
>
>
> The function works, but is extremely slow.
>
> It would take about 4 days for this to finish for a single data set,  
> and I
> have 64 data sets.
>
> Why does the rate at which coordinate averages are calculated  
> increase when
> coord is smaller, but not when scores is smaller?
>
> How can I accomplish the same thing more efficiently?

You probably need to start by reading the vignettes for the IRanges  
package. It's difficult to be sure since you did not show the code for  
what you were doing currently.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list