[Bioc-devel] [Bio-dev]: how to iterate and index set of bed files ?

Jurat Shayidin juratbupt at gmail.com
Fri Feb 19 17:00:18 CET 2016


Dear all:
I am developing my package for my projects, and I have done couple of
utility function that used for parsing bed files in R. My goal is to parse
and analyze multiple bed files in parallel, in ideal case, we have three
sample that comes from chip-seq experiments where each has different
length, and goal is to process multiple sample in parallel.
*my input parameter is set of bed files, when I am gonna index first bed
files as querySample, while rest of bed files are being as targetSample. I
used findOverlaps function from GenomicRanges packages, when all features
from querySample are overlapped with all features from targetSample, report
overlapped peak and generate new bed files to save them , then chose second
bed file as QuerySaple, while chose others as targetSample, repeat above
process. *
here is my question, hope dear member give me some idea how to get out this
problem. *how to iterate and index set of bedFiles?*
*FYI, I carefully read posting guide for how to ask question in Bio-Dev
mailing list, if I made mistake on that, I will be appreciate if someone
remind me. Many thanks to all of you*
 I think there is set of combination, such as below:
bed.1 parallel map to (bed.2, bed.3, bed.4)
bed.2 parallel map to (bed.1, bed.3, bed.4)
bed.3 parallel map to (bed.1, bed.2, bed.4)
bed.4 parallel map to (bed.1, bed.2, bed.3)

for example, this my R code:

indexSample <- function(bedFiles, desDir=getwd(), verbose=FALSE){
  if(is(bedFiles %in% desDir)){
    file <- list.files(path = bedFiles)
    idx <- unlist(sapply("bed", grep, file))
    idx <- sort(unique(idx))
    bedFiles <- file[idx]

    for(j in 1:length(bedFiles)){
      qSample <- bedFiles[1]    # chosen querySample bed file
      qIdx <- which(j==qSample)
      if(!is(qSample[1],"GRanges")){
        qSample.gr <- loadSample(qSample)   # loadSample to read bed file
as GRanges objects
      }
      else{
        qSample.gr <- qSample
      }
      # there is code that access all features of qSample [I have done
already]

      for(jj in 2:length(bedFiles)){
        tSample <- bedFiles[jj]  # rest of bed files (multiple)
        # there is code that put all features of tSample in GNCList object
[I have done]
      }
      # then call findOverlap from GenomicRanges packages
    }
  }
  # return result of first case
}






-- 
Jurat Shahidin
Ph.D. candidate
Dipartimento di Elettronica, Informazione e Bioingegneria
Politecnico di Milano
Piazza Leonardo da Vinci 32 - 20133 Milano, Italy
Mobile : +39 3279366608

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list