[BioC] flip strand of GappedAlignmentPairs Object
Valerie Obenchain
vobencha at fhcrc.org
Wed Feb 27 20:47:06 CET 2013
Thanks Herve.
On 02/27/13 11:23, Hervé Pagès wrote:
> Hi Stefanie, Val,
>
> Makes sense to have a strand setter for GappedAlignmentPairs
> objects, like we have for GappedAlignments objects. I've added
> it to GenomicRanges devel:
>
> > galp
> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
> seqnames strand : ranges -- ranges
> <Rle> <Rle> : <IRanges> -- <IRanges>
> EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537]
> EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239]
> EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563]
> EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323]
> EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95]
> ---
> seqlengths:
> chr1 chr2
> 1575 1584
>
> > strand(first(galp))
> factor-Rle of length 5 with 3 runs
> Lengths: 2 2 1
> Values : - + -
> Levels(3): + - *
>
> > strand(last(galp))
> factor-Rle of length 5 with 3 runs
> Lengths: 2 2 1
> Values : + - +
> Levels(3): + - *
>
> Flip the strand:
>
> > strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
> > galp
> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
> seqnames strand : ranges -- ranges
> <Rle> <Rle> : <IRanges> -- <IRanges>
> EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537]
> EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239]
> EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563]
> EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323]
> EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95]
> ---
> seqlengths:
> chr1 chr2
> 1575 1584
>
> > strand(first(galp))
> factor-Rle of length 5 with 3 runs
> Lengths: 2 2 1
> Values : + - +
> Levels(3): + - *
>
> > strand(last(galp))
> factor-Rle of length 5 with 3 runs
> Lengths: 2 2 1
> Values : - + -
> Levels(3): + - *
>
> Note that a slightly more efficient way to compute the flipped strand
> is:
>
> runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
> "-", "+")
>
> This will only make a difference on big GappedAlignmentPairs objects of
> course.
>
> > sessionInfo()
> R Under development (unstable) (2013-02-19 r62008)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33
> [4] IRanges_1.17.35 BiocGenerics_0.5.6
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0
>
>
> GenomicRanges 1.11.33, will become available via biocLite(0 in the next
> 24 hours or so.
>
> Cheers,
> H.
>
>
> On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
>> You could use the low-level constructor
>>
>> GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>>
>> 'isProperPair' is a logical vector the same length as 'first' and
>> 'last'. For example,
>>
>> GappedAlignmentPairs(left, right, !logical(length(left)))
>>
>>
>> Valerie
>>
>> On 02/27/13 07:12, Stefanie Tauber wrote:
>>> Hi Valerie,
>>>
>>> thanks for the quick answer.
>>> I have already checked how the strand is assigned for paired-end data.
>>> After setting the strand for the left and right parts,
>>> can I somehow rebuild the gappedalignmentPairs object?
>>>
>>> best,
>>> Stefanie
>>>
>>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>>> <mailto:vobencha at fhcrc.org>>:
>>>
>>>> Hi Stephanie,
>>>>
>>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>>> GappedAlignments.
>>>>
>>>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>>
>>>> You can extract the left/right parts with an accessor then look at
>>>> strand, cigars, gaps etc.
>>>>
>>>> > left <- left(galp)
>>>> > class(left)
>>>> [1] "GappedAlignments"
>>>> attr(,"package")
>>>> [1] "GenomicRanges"
>>>>
>>>> > strand(left)
>>>> factor-Rle of length 1572 with 665 runs
>>>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5
>>>> 1 51
>>>> Values : - + - + - + - + - + - ... + - + - + - + -
>>>> + -
>>>> Levels(3): + - *
>>>>
>>>> > head(cigar(left))
>>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>>
>>>> > head(ngap(left))
>>>> [1] 0 0 0 0 0 0
>>>>
>>>> It's on these left/right pairs that you can reset the strand.
>>>>
>>>> > strand(left)[1:3]
>>>> factor-Rle of length 3 with 1 run
>>>> Lengths: 3
>>>> Values : +
>>>> Levels(3): + - *
>>>> > strand(left)[1:3] <- "-"
>>>> > strand(left)[1:3]
>>>> factor-Rle of length 3 with 1 run
>>>> Lengths: 3
>>>> Values : -
>>>> Levels(3): + - *
>>>>
>>>> Please be sure to read the details on the man page regarding how the
>>>> strands were assigned.
>>>>
>>>> ?GappedAlignmentPairs
>>>>
>>>>
>>>> Valerie
>>>>
>>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>>> Dear list,
>>>>>
>>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>>> Then I could read in the data with:
>>>>>
>>>>> library(GenomicRanges)
>>>>>
>>>>>> reads = readGappedAlignments("data.bam")
>>>>> As I typically don't know which strand has been sequenced, it might
>>>>> be necessary to flip the strand of the reads before doing any kind
>>>>> of summarizeOverlap:
>>>>>
>>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>>> of transcript database.
>>>>>
>>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>>> I read the data:
>>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>>> it does not work as above.
>>>>> I could imagine several work-arounds, I just wonder if there is any
>>>>> straight approach which I miss at the moment..
>>>>>
>>>>>
>>>>> best regards,
>>>>> Stefanie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> DI Stefanie Tauber
>>>
>>> Center for Integrative Bioinformatics Vienna (CIBIV)
>>> (CIBIV is a joint institute of Vienna University and Medical
>>> University)
>>> Max F. Perutz Laboratories (MFPL)
>>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>>> Dr. Bohr Gasse 9
>>> A-1030 Wien, Austria
>>> Phone: ++43 +1 / 42772-4030
>>> Fax: ++43 +1 / 42772-4098
>>> email: stefanie.tauber at univie.ac.at
>>> <mailto:stefanie.tauber at univie.ac.at>
>>> www.cibiv.at <http://www.cibiv.at>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list