[BioC] iranges
Valerie Obenchain
vobencha at fhcrc.org
Mon Aug 11 18:21:19 CEST 2014
Hi,
On 08/09/14 04:37, carol white wrote:
> 1- suppose, I have seq with start and end that for some seq start > end
> and for some other start < end. can I create a GRanges with strand being
> defined as + if start > end and - otherwise and take start and end as
> they (on reverse strand, start > end and on forward, start < end) or
> should I swap start and end?
The convention for storing negative ranges in a GRanges object is the
same a positive; smallest ranges first. For example, the second and
third elements of 'tx':
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")
> tx[2:3]
GRangesList of length 2:
$FBgn0000008
GRanges with 3 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr2R [18024494, 18060339] + | 7681 FBtr0100521
[2] chr2R [18024496, 18060346] + | 7682 FBtr0071763
[3] chr2R [18024938, 18060346] + | 7683 FBtr0071764
$FBgn0000014
GRanges with 4 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
[1] chr3R [12632936, 12655767] - | 21863 FBtr0306337
[2] chr3R [12633349, 12653845] - | 21864 FBtr0083388
[3] chr3R [12633349, 12655300] - | 21865 FBtr0083387
[4] chr3R [12633349, 12655474] - | 21866 FBtr0300485
IRanges / GRanges don't hold negative width ranges so you can't create a
range with start > width.
> GRanges("chr1", IRanges(10, 5))
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE =
"IRanges") :
solving row 1: negative widths are not allowed
The one exception is a zero-width range but that's not applicable here:
> width(GRanges("chr1", IRanges(4,3)))
[1] 0
>
> 2- if I use resize with your example, why the start of the output of
> resize is the same gene's start although I take 10bp from the end?
resize() is strand-aware. In the GRanges the ranges are ordered left to
right (smallest to largest). Because transcription is from 3' to 5' for
neg ranges, these end values
> end(gene)
[1] 12655767 12653845 12655300 12655474
are considered the start values. You can see the difference when we
change the strand.
strand(gene) <- "+"
> resize(gene, 10, fix="end")
GRanges with 4 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr3R [12655758, 12655767] + | 21863 FBtr0306337
[2] chr3R [12653836, 12653845] + | 21864 FBtr0083388
[3] chr3R [12655291, 12655300] + | 21865 FBtr0083387
[4] chr3R [12655465, 12655474] + | 21866 FBtr0300485
>
> resize(gene, 10, fix="end")
> GRanges with 4 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] chr3R [12632936, 12632945] - | 21863 FBtr0306337
> [2] chr3R [12633349, 12633358] - | 21864 FBtr0083388
> [3] chr3R [12633349, 12633358] - | 21865 FBtr0083387
> [4] chr3R [12633349, 12633358] - | 21866 FBtr0300485
> ---
> seqlengths:
> chr2L chr2R chr3L chr3R ... chrXHet chrYHet
> chrUextra
> 23011544 21146708 24543557 27905053 ... 204112 347038
> 29004656
>> gene
> GRanges with 4 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337
> [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388
> [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387
> [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485
> ---
> seqlengths:
> chr2L chr2R chr3L chr3R ... chrXHet chrYHet
> chrUextra
> 23011544 21146708 24543557 27905053 ... 204112 347038
> 29004656
> ####################################################
> 3- why do I get err ms in using narrow
>
> narrow(gene, width=10, end= end(ranges(gene)))
> Error in .Call2("solve_user_SEW", refwidths, start, end, width,
> translate.negative.coord, :
> solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end
> (12655767) is > refwidth
'start' and 'end' are integers specifying the distance from the current
start and end. 'end = -3' defines the end as 3 less than the current
end; 'start = 5' defines the start as 5 greater than the current start.
It's tricky to get ranges of length 10 anchored at the end with narrow():
newStart <- (end(gene) - 9) - (start(gene) - 1)
narrowGene <- narrow(gene, start = newStart)
width(narrowGene)
> width(narrowGene)
[1] 10 10 10 10
Several functions on the inter-range-methods man page are similar and
you can often get the same result with different functions. In this
case, it may be more straightforward to use restrict():
> restrictGene <- restrict(gene, start=end(gene) - 9, end=end(gene))
> width(restrictGene)
[1] 10 10 10 10
> restrictGene
GRanges with 4 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr3R [12655758, 12655767] - | 21863 FBtr0306337
[2] chr3R [12653836, 12653845] - | 21864 FBtr0083388
[3] chr3R [12655291, 12655300] - | 21865 FBtr0083387
[4] chr3R [12655465, 12655474] - | 21866 FBtr0300485
Valerie
>
> Regards,
>
>
> On Friday, August 8, 2014 6:11 PM, Valerie Obenchain
> <vobencha at fhcrc.org> wrote:
>
>
> Hi,
>
> Please 'reply all' when responding so communication stays on the list.
>
> If you are working with stranded ranges you should use the GRanges
> container. IRanges is not strand-aware and does not have a strand
> argument. You can see the function signature on the man page by typing
>
> ?IRanges
>
> > Usage:
> >
> > ## IRanges constructor:
> > IRanges(start=NULL, end=NULL, width=NULL, names=NULL)
> >
>
> Load a Transcript Db object and extract transcripts by gene:
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
> tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")
>
> Select a gene with transcripts on the negative strand:
> gene <- tx[[3]]
> >> gene
> > GRanges with 4 ranges and 2 metadata columns:
> > seqnames ranges strand | tx_id tx_name
> > <Rle> <IRanges> <Rle> | <integer> <character>
> > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337
> > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388
> > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387
> > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485
>
> GRanges can be manipulated with resize(), trim(), shift(), flank(),
> narrow() and several other methods. To see them type (with the quotes)
>
> ?`intra-range-methods`
>
> and select the page for GRanges. It sounds like resize() is what you're
> looking for.
>
> resize(gene, width = 10)
> >> resize(gene, width = 10)
> > GRanges with 4 ranges and 2 metadata columns:
> > seqnames ranges strand | tx_id tx_name
> > <Rle> <IRanges> <Rle> | <integer> <character>
> > [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337
> > [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388
> > [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387
> > [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485
>
> If you have sequence data instead of range data, the XStringSet family
> is more appropriate. For examples of manipulating sequences see Section
> E on the XStringSet man page. The functions you want are narrow() or
> subseq().
>
> library(Biostrings)
> ?XStringSet
>
>
> Valerie
>
>
> On 08/08/2014 08:38 AM, carol white wrote:
> > I have the problem when i want to take the width from the end of a
> > sequence on a reverse strand.
> > if I take the nucleotide seq of a gene that is on the reverse strand on
> > the ncbi web site and extract for ex 10 or 20 bp from the end, i don't
> > get the same as I do with iranges. As I have already given the strand as
> > the parameter to the iranges function, I assume that it has already
> > reverse-complemented by iranges. I don't have this problem with the
> > genes that are on the forward strand nor when I take the sub sequence
> > from the beginning of the sequence.
> >
> > Regards,
> > On Friday, August 8, 2014 5:28 PM, Valerie Obenchain
> > <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
> >
> >
> > Did you provide 'start', 'end' and 'width' and get a confusing answer?
> > If yes, please show your example.
> >
> > Thanks.
> > Valerie
> >
> >
> >
> > On 08/08/2014 08:23 AM, Valerie Obenchain wrote:
> > > Hi Carol,
> > >
> > > The 'end' is the end of the range. When you specify ranges with 'end'
> > > and 'width' the range will always end at the 'end' value.
> > >
> > > > IRanges(end = 10, width = c(5, 10))
> > > IRanges of length 2
> > > start end width
> > > [1] 6 10 5
> > > [2] 1 10 10
> > >
> > >
> > > Similar reasoning for 'start' and 'width':
> > >
> > > > IRanges(start = 10, width = c(5, 10))
> > > IRanges of length 2
> > > start end width
> > > [1] 10 14 5
> > > [2] 10 19 10
> > >
> > >
> > > Valerie
> > >
> > >
> > >
> > > On 08/08/2014 01:29 AM, carol white wrote:
> > >> Hi,
> > >> How does width with start and end in IRanges work? I thought that
> if I
> > >> use the end with a width, then the sequence from the end with the
> > >> length of width is taken. However, in my case when I use width for ex
> > >> 20 and 10, the corresponding sequences with the length 20 and 10 are
> > >> not the same from the end but from the beginning. Did I misunderstood
> > >> some thing?
> > >>
> > >> Regards,
> > >>
> > >> Carol
> > >> [[alternative HTML version deleted]]
> > >>
> > >> _______________________________________________
> > >> Bioconductor mailing list
> > >> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >> Search the archives:
> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >>
> > >
> > >
> >
> >
> > --
> > Valerie Obenchain
> > Program in Computational Biology
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, Seattle, WA 98109
> >
> > Email: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
> <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>
>
> > Phone: (206) 667-3158
> >
> >
>
>
>
More information about the Bioconductor
mailing list