[BioC] find overlap of bed files of different length
Duke
duke.lists at gmx.com
Tue Feb 1 19:31:37 CET 2011
On 2/1/11 12:07 PM, Kasper Daniel Hansen wrote:
> Well, clearly I have not done it, but I would expect that a decent
> implementation of my method would take less than 2 minutes (although
> it depends on length of the stuff in the BED file you started with).
> At least the computational load should not be much more than running
> findOverlaps.
I definitely want to solve my problem using R, but given that I am still
new to R and that I have anlysis to be done, and that I need something
that get the job done quick (that was why I decided to go for R with the
hope that some bioconductor packages would help), I got it done with C++
first. As soon as I have a more time to spend, I will try to make it to
work with R.
D.
> Kasper
>
> On Tue, Feb 1, 2011 at 10:06 AM, Duke<duke.lists at gmx.com> wrote:
>> On 1/31/11 1:20 PM, Kasper Daniel Hansen wrote:
>>> Use findOverlaps to find all cases. This is usually the hard and big
>>> computation. Then use for example pintersect() to compute the actual
>>> overlap in percent. There might be some tedious coding involved.
>> Thanks for your suggestion Kasper, though honestly I have not tried it yet.
>> But based on what Martin and you suggested, I thought the final code will
>> not run fast because of extracting to strand/subset and running each.
>> Especially my task is a little more complicated: I need to find gene
>> expressions (counting sequences in exonic regions of each gene). I also gave
>> BEDTools a try, but it does not fulfil my needs (extremely slow for a gene
>> list of 28k).
>>
>> I ended up with coding a c++ code to do the job. Thanks for all of your
>> suggestions and helps guys.
>>
>> D.
>>
More information about the Bioconductor
mailing list