[Bioc-devel] FDR estimation for Biological ChIP-Seq replicate in the context of GRanges.

Wed Apr 13 16:28:01 CEST 2016

Hi, BioC devel:

I have been working on my packages and it is about to close up works except
FDR estimations. However, I have started to read & load three replicates
(bed file format) in GRanges objects, and I have to consider the case when
chosen sample is Biological or Technical respectively ,so this is general
workflow that I have implemented in my packages.

in the context of processing three GRanges object for finding
colocalization evidence across these sample, and this is the general
workflow:

-> read & load multiple sample (bed format) in GRanges  - > find overlapped
regions conditionally in parallel -> filtering function with specific
threshold value (a.k.a, count overall overlapped regions in parallel) ->
chisq.test() for data that passed from previous step- > based on the
combined pvalue, further filtering process with second threshold value
(data that passed from previous step) - > final output as GRanges (preserve
data who also passed from previous step, but not export them to hard disk)

first running of my packages are: (a as chosenSample, b,c are
supportingSamples):

ov_ab_1 <- as(findOverlaps(a, b), "List")
ov_ac_1 <- as(findOverlaps(a, c), "List")

in second running of my packages, I have to switch parameter (where b as
chosenSample, a,c are supportingSample), such as:

ov_ba_2 <- as(findOverlaps(b,a), "List")
ov_bc_2 <- as(findOverlaps(b,c), "List")

in the third running test, I am gonna do like this (where c as
chosenSample, a,b are supportingSample)):

ov_ca_3 <- as(findOverlaps(c,a), "List")
ov_cb_3 <- as(findOverlaps(c,b), "List")

However, implementing FDR estimation for a, b, c from first , second, third
running test, where each processed sample has three different output :

for example:
a_preserved_first_test, a_preserved_second_test, a_preserved_third_test and
same ouput format for b, c respectively

*Objective*: in the context of Biological replicates, I want to retrieve
common regions that found at least two running test (but how, I am seeking
solution for them), then pass these regions to p.adjust() to get adjusted
pvalue, then do further filtering process with third threshold value, and
generate output for these regions that passed previous step finally .

*Question*:

In order to do FDR estimation, I need to run my packages three times (if
three sample are an input), where I may put result of each test into
specific R environment (I am not sure this is right things to do). Is there
any possible optimizing approach regarding running my packages three times
(any chance to recursively switch to next running test when previous
running test is done).?

I am not sure if I create sub-environment where saving the result of each
running test. I hope there might be better solution.  Maybe my question is
bit of straightforward to you, forgive my naive question if it was. Any
possible  approach, suggestion, trivial solution or any recommended
bioconductor packages may help out above question, that are highly
appreciated. Thank a lot

Best regards

-- 
Jurat Shahidin
Ph.D. candidate
Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano
Piazza Leonardo da Vinci 32 - 20133 Milano, Italy
Mobile : +39 3279366608

	[[alternative HTML version deleted]]