[BioC] A code kata: evening-out *Ranges ends.
mtmorgan at fhcrc.org
mtmorgan at fhcrc.org
Wed Aug 25 06:17:50 CEST 2010
Quoting Steve Lianoglou <mailinglist.honeypot at gmail.com>:
> In case anyone else is trying to sharpen their IRanges-fu, here's a
> possible solution to my own question -- works much faster.
>
> Instead of:
>
>> reads <- GRanges(seqnames='chr1', strand='+',
>> ranges=IRanges(start=c(sample(18:22, 10, replace=TRUE),
>> sample(100:108, 10, replace=TRUE)),
>> width=sample(18:20, 20, replace=TRUE)))
>>
>> You can see there are two "islands" of reads there, at around
>> positions 20-40, and 100-120.
>> What I want to do is to "fix" the reads in each island to have the
>> same start/end position.
>>
>> Here's code I have to do that (forget any "strand" issues here --
>> those are already taken care of):
>>
>> fence.posts <- reduce(reads)
>> o <- findOverlaps(fence.posts, reads)
Just
ranges(reads)[subjectHits(o)] <- ranges(fence.posts)[queryHits(o)]
? (I would have done findOverlaps(reads, fence.posts), but...)
>> subjects <- subjectHits(o)
>> query <- queryHits(o)
>>
>> for (idx in unique(query)) {
>> adjust <- subjects[query == idx]
>> start(reads[adjust]) <- start(fence.posts[idx])
>> end(reads[adjust]) <- end(fence.posts[idx])
>> }
>
> I'm doing:
>
> covr <- coverage(reads)[[1]]
> fences <- slice(covr, 1)
or
s = covr > 1
breaks = cumsum(runLength(s))[!runValue(s)]
post = cut(start(reads), c(breaks, Inf), labels=FALSE)
ranges(reads) = ranges(fence.posts)[post]
? but the method above keeps track of sequences...
Martin
> f.repeat <- viewMaxs(fences)
> repacked <- IRanges(start=rep(start(fences), f.repeat),
> end=rep(end(fences), f.repeat))
> if (length(repacked) != length(reads)) {
> stop("Length of repacked reads is not the same as the original")
> reads <- GRanges(seqnames=seqnames(reads), ranges=repacked,
> strand=strand(reads))
>
> I'm assuming that all the reads come from the same chromosome (so I
> unlist the result from coverage), and I'm only sending in reads of the
> same strand to this part of the function.
>
> Anyway, running this newer version on 1000 tags takes 0.181 seconds,
> where the older version took 20 seconds.
>
> Thought some might find it interesting, otherwise sorry to spam.
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list