[Bioc-devel] Question: set purification test for overlapped regions between three GRanges object

Jurat Shayidin juratbupt at gmail.com
Thu Apr 28 18:20:25 CEST 2016


Dear Mailing list:

I got stuck with implementing wrapper function for my packages. when three
GRanges objects (e.g. a,b,c) are given to my function, I would like to find
overlapped regions from one to another in parallel, where a as query, b,c
are subjects respectively. Because of three input was given, so I want to
call findOverlaps function in the context of changing parameter (query,
subject will be switched in each individual test). To be clarify my point,
I could have this workflow:

*1st test*: ov_1 <- list(ov.1 <- findOverlaps(a, b), ov.2 <- findOverlaps(
a,c))

intermediate output of 1st test: a.1 <- list(a.sc, a.wd) ; b.1 <- list(b.sc,
b.wd) ; c.1 <- list(c.sc, c.wd)

*2nd test*: ov_2 <- list(ov.1 <- findOverlaps(b,a), ov.2 <- findOverlaps(b,c
))

intermediate output of 2nd test: b.2 <- list(b.sc_, b.wd_) ; a.2 <- list(a.sc_,
a.wd_) ;  c.2 <- list(c.sc_, c.wd_)

*3rd test*: ov_3 <- list(ov.1 <- findOverlaps(c,a), ov.2 <- findOverlaps(c,b
))

intermediate output of 3rd test: c.3 <- list(b.sc__, b.wd__) ; a.3 <-
list(a.sc__,
a.wd__) ;  c.3 <- list(c.sc__, c.wd__)

start 1st test-> read data  - > find overlapped regions conditionally in
parallel -> filtering function with specific threshold value  ->
chisq.test() - > get combined pvalue, and do further filtering process - >
save result in function' environment -> go to 2nd test -> repeat workflow -
- -   -> go to 3rd test -> - - - -   -> all test is done, prepare to
generate final output of each GRanges objects - > package job is DONE !

In particular, in each individual test, a,b,c could contain 2 different
group of regions as intermediate output, but it is not final step, I must
go to 2nd test, 3rd test respectively.

However, I have hard time to find efficient solution for this issue,
because each individual test, a,b,c contains different set of genomic
regions where each has 2 different group of regions as intermediate output.
My goal is to implement function for set purification for intermediate
output of each GRanges objects from 3 different test.

desired job that I want to implement is , for a.1 <- list(a.sc, a.wd) ,
  a.2 <- list(a.sc_, a.wd_) ,  a.3 <- list(a.sc__, a.wd__), implement
function to retrieve set of genomic regions that all appeared in 1st, 2nd,
3rd test respectively.

*Objective*: I want to retrieve the regions that all appeared in 1st, 2nd,
3rd test. How can I efficiently solve this issues ?Is there any one give me
possible idea to solve this problem? Any possible approach, IDEA, sketch
solution, or existing bioconductor package are highly appreciated. Thanks a
lot

Best regards:

-- 
Jurat Shahidin
Ph.D. candidate
Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano
Piazza Leonardo da Vinci 32 - 20133 Milano, Italy
Mobile : +39 3279366608

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list