[BioC] seqselect and window in GRanges

Martin Morgan mtmorgan at fhcrc.org
Tue Nov 30 15:10:53 CET 2010


On 11/30/2010 05:38 AM, arne.mueller at novartis.com wrote:
> Dear All,
> 
> may I ask a basic question about the GRanges package. It seems that the 
> functions seqselect and window treat start/end as indexes in the GRanges 
> object rather that he actually start/end positions. Is there a method with 
> which I can extract a sub-range from an GRanges object based on genomic 
> coordinates rather than indexes?

Hi Arne --

it sounds a bit like you want to 1) find overlaping ranges between gr
and genomic location(s) and then 2) restrict (narrow might be
appropriate if looking for, say 5' regions) the ranges to those
locations, along the lines of

> gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))]
> ranges(gr1) <- restrict(ranges(gr1), 12, 18)
> gr1
GRanges with 1 range and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle> <IRanges>  <Rle> |
[1]        A  [12, 18]      * |

seqlengths
  A
 NA

gr %in% GRanges(<...>) is sugar for match(), which is sugar for
findOverlaps.

Martin


> 
>> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, 
> 200)))
>> gr
> GRanges with 2 ranges and 0 elementMetadata values
>     seqnames     ranges strand |
>        <Rle>  <IRanges>  <Rle> |
> [1]        A [ 10,  20]      * |
> [2]        A [100, 200]      * |
> 
> seqlengths
>   A
>  NA
>>
>> window(gr, start=12, end=98)
> Error in solveWindowSEW(length(x), start, end, width) : 
>   Invalid sequence coordinates.
>   Please make sure the supplied 'start', 'end' and 'width' arguments
>   are defining a region that is within the limits of the sequence.
>> window(gr, start=1, end=2)
> GRanges with 2 ranges and 0 elementMetadata values
>     seqnames     ranges strand |
>        <Rle>  <IRanges>  <Rle> |
> [1]        A [ 10,  20]      * |
> [2]        A [100, 200]      * |
> 
> seqlengths
>   A
>  NA
>> window(gr, start=9, end=40)
> Error in solveWindowSEW(length(x), start, end, width) : 
>   Invalid sequence coordinates.
>   Please make sure the supplied 'start', 'end' and 'width' arguments
>   are defining a region that is within the limits of the sequence.
> ...
> 
> 
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2010-10-31 r53501)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8 
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C 
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base 
> 
> other attached packages:
> [1] GenomicRanges_1.1.38 IRanges_1.9.3 
> 
> loaded via a namespace (and not attached):
> [1] tools_2.13.0
> 
>     thanks a lot for your help,
> 
>     Arne
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list