[Bioc-devel] Question: set purification test for overlapped regions between three GRanges object

Michael Stadler michael.stadler at fmi.ch
Fri Apr 29 10:02:22 CEST 2016


Dear Jurat,

Maybe this would be better asked on support.bioconductor.org.

I don't think I fully understand what you intend to do (a fully working
example would help), but here are two ideas that could be useful:

1. Maybe a single findOverlaps() will be enough to find all you need:

D <- c(a,b,c)
ov <- findOverlaps(query=D, subject=D)

By inspecting queryHits(ov) and subjectHits(ov), you can find out
whether a particular regions originally came from a, b or c (e.g.
anything in 1:length(a) will be regions from a).

2. Maybe you can use intersect() to find common indices, see ?intersect

Hope that helps,
Michael

On 28.04.2016 18:20, Jurat Shayidin wrote:
> Dear Mailing list:
> 
> I got stuck with implementing wrapper function for my packages. when
> three GRanges objects (e.g. a,b,c) are given to my function, I would
> like to find overlapped regions from one to another in parallel,
> where a as query, b,c are subjects respectively. Because of three
> input was given, so I want to call findOverlaps function in the
> context of changing parameter (query, subject will be switched in
> each individual test). To be clarify my point, I could have this
> workflow:
> 
> *1st test*: ov_1 <- list(ov.1 <- findOverlaps(a, b), ov.2 <-
> findOverlaps( a,c))
> 
> intermediate output of 1st test: a.1 <- list(a.sc, a.wd) ; b.1 <-
> list(b.sc, b.wd) ; c.1 <- list(c.sc, c.wd)
> 
> *2nd test*: ov_2 <- list(ov.1 <- findOverlaps(b,a), ov.2 <-
> findOverlaps(b,c ))
> 
> intermediate output of 2nd test: b.2 <- list(b.sc_, b.wd_) ; a.2 <-
> list(a.sc_, a.wd_) ;  c.2 <- list(c.sc_, c.wd_)
> 
> *3rd test*: ov_3 <- list(ov.1 <- findOverlaps(c,a), ov.2 <-
> findOverlaps(c,b ))
> 
> intermediate output of 3rd test: c.3 <- list(b.sc__, b.wd__) ; a.3
> <- list(a.sc__, a.wd__) ;  c.3 <- list(c.sc__, c.wd__)
> 
> start 1st test-> read data  - > find overlapped regions conditionally
> in parallel -> filtering function with specific threshold value  -> 
> chisq.test() - > get combined pvalue, and do further filtering
> process - > save result in function' environment -> go to 2nd test ->
> repeat workflow - - -   -> go to 3rd test -> - - - -   -> all test is
> done, prepare to generate final output of each GRanges objects - >
> package job is DONE !
> 
> In particular, in each individual test, a,b,c could contain 2
> different group of regions as intermediate output, but it is not
> final step, I must go to 2nd test, 3rd test respectively.
> 
> However, I have hard time to find efficient solution for this issue, 
> because each individual test, a,b,c contains different set of
> genomic regions where each has 2 different group of regions as
> intermediate output. My goal is to implement function for set
> purification for intermediate output of each GRanges objects from 3
> different test.
> 
> desired job that I want to implement is , for a.1 <- list(a.sc, a.wd)
> , a.2 <- list(a.sc_, a.wd_) ,  a.3 <- list(a.sc__, a.wd__),
> implement function to retrieve set of genomic regions that all
> appeared in 1st, 2nd, 3rd test respectively.
> 
> *Objective*: I want to retrieve the regions that all appeared in 1st,
> 2nd, 3rd test. How can I efficiently solve this issues ?Is there any
> one give me possible idea to solve this problem? Any possible
> approach, IDEA, sketch solution, or existing bioconductor package are
> highly appreciated. Thanks a lot
> 
> Best regards:
>



More information about the Bioc-devel mailing list