[Bioc-devel] Subject: GRanges performance issue, how to avoid looping?

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Feb 22 13:59:49 CET 2012


?findOverlaps

Kasper

On Wed, Feb 22, 2012 at 7:15 AM, Jesper Gådin <jesper.gadin at gmail.com> wrote:
> Hi everyone!
> I want to find all reads that map to a specific position in a GRanges
> object.
> In my case this object is named "reads" and I have tried using the function
> below.
>
>
> #From a given position and chromosome, find the reads where this position
> map
> fun_find_reads_from_pos <- function(reads,chr,someposition, verbose=TRUE) {
>
>    #get length over the sample list
>    nr_samples <-length(reads)
>
>    for (j in (1:nr_samples)) {
>        onePerson <- reads[[j]]
>        chr_onePerson <- onePerson[seqnames(onePerson)==chr]
>        pre_seq<-
> chr_onePerson[(ranges(chr_onePerson))%in%(IRanges(start=someposition,
> width=1)),6]
>
>        if(!length(pre_seq)==0) {
>            for (k in (1:length(pre_seq))) {
>                position <- (someposition-start(pre_seq[k])) +1
> #57537220
>                print("found the given someposition on this position within
> the read")
>                print(position)
>            }
>        }
>    }
>
> } #End of function
>
>
> To use the function try:
> fun_find_reads_from_pos(reads,"chr12",57537220)
>
> Now that should work.
> And everything would be fine if it wasnt for the time issue.
> The time-thief I guess is that I have to loop over every read
> in every sample to get to the information I want. Is it
> possible to use the data structure in another better way to
> avoid unnecessary looping. Anyone have an idea?
>
> This function would be the core function of my future package.
> So its important that it is effective.
>
> Sincerely,
> Jesper
>
> Link to the reads object needed to run the function:
> http://uppsalanf.se/sites/default/files/reads_object.RData
> And the function itself (same as above):
> http://uppsalanf.se/sites/default/files/example-bioc-dev.R
>
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.20.4   GenomicRanges_1.4.8 IRanges_1.10.6
>> ?GRanges
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.20.4   GenomicRanges_1.4.8 IRanges_1.10.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.2
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list