[Bioc-devel] Subject: GRanges performance issue, how to avoid looping?
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Wed Feb 22 13:59:49 CET 2012
?findOverlaps
Kasper
On Wed, Feb 22, 2012 at 7:15 AM, Jesper Gådin <jesper.gadin at gmail.com> wrote:
> Hi everyone!
> I want to find all reads that map to a specific position in a GRanges
> object.
> In my case this object is named "reads" and I have tried using the function
> below.
>
>
> #From a given position and chromosome, find the reads where this position
> map
> fun_find_reads_from_pos <- function(reads,chr,someposition, verbose=TRUE) {
>
> #get length over the sample list
> nr_samples <-length(reads)
>
> for (j in (1:nr_samples)) {
> onePerson <- reads[[j]]
> chr_onePerson <- onePerson[seqnames(onePerson)==chr]
> pre_seq<-
> chr_onePerson[(ranges(chr_onePerson))%in%(IRanges(start=someposition,
> width=1)),6]
>
> if(!length(pre_seq)==0) {
> for (k in (1:length(pre_seq))) {
> position <- (someposition-start(pre_seq[k])) +1
> #57537220
> print("found the given someposition on this position within
> the read")
> print(position)
> }
> }
> }
>
> } #End of function
>
>
> To use the function try:
> fun_find_reads_from_pos(reads,"chr12",57537220)
>
> Now that should work.
> And everything would be fine if it wasnt for the time issue.
> The time-thief I guess is that I have to loop over every read
> in every sample to get to the information I want. Is it
> possible to use the data structure in another better way to
> avoid unnecessary looping. Anyone have an idea?
>
> This function would be the core function of my future package.
> So its important that it is effective.
>
> Sincerely,
> Jesper
>
> Link to the reads object needed to run the function:
> http://uppsalanf.se/sites/default/files/reads_object.RData
> And the function itself (same as above):
> http://uppsalanf.se/sites/default/files/example-bioc-dev.R
>
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.20.4 GenomicRanges_1.4.8 IRanges_1.10.6
>> ?GRanges
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.20.4 GenomicRanges_1.4.8 IRanges_1.10.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.2
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list