[BioC] flip strand of GappedAlignmentPairs Object
Hervé Pagès
hpages at fhcrc.org
Wed Feb 27 20:23:21 CET 2013
Hi Stefanie, Val,
Makes sense to have a strand setter for GappedAlignmentPairs
objects, like we have for GappedAlignments objects. I've added
it to GenomicRanges devel:
> galp
GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537]
EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239]
EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563]
EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323]
EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95]
---
seqlengths:
chr1 chr2
1575 1584
> strand(first(galp))
factor-Rle of length 5 with 3 runs
Lengths: 2 2 1
Values : - + -
Levels(3): + - *
> strand(last(galp))
factor-Rle of length 5 with 3 runs
Lengths: 2 2 1
Values : + - +
Levels(3): + - *
Flip the strand:
> strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
> galp
GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
seqnames strand : ranges -- ranges
<Rle> <Rle> : <IRanges> -- <IRanges>
EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537]
EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239]
EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563]
EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323]
EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95]
---
seqlengths:
chr1 chr2
1575 1584
> strand(first(galp))
factor-Rle of length 5 with 3 runs
Lengths: 2 2 1
Values : + - +
Levels(3): + - *
> strand(last(galp))
factor-Rle of length 5 with 3 runs
Lengths: 2 2 1
Values : - + -
Levels(3): + - *
Note that a slightly more efficient way to compute the flipped strand
is:
runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
"-", "+")
This will only make a difference on big GappedAlignmentPairs objects of
course.
> sessionInfo()
R Under development (unstable) (2013-02-19 r62008)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33
[4] IRanges_1.17.35 BiocGenerics_0.5.6
loaded via a namespace (and not attached):
[1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0
GenomicRanges 1.11.33, will become available via biocLite(0 in the next
24 hours or so.
Cheers,
H.
On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
> You could use the low-level constructor
>
> GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>
> 'isProperPair' is a logical vector the same length as 'first' and
> 'last'. For example,
>
> GappedAlignmentPairs(left, right, !logical(length(left)))
>
>
> Valerie
>
> On 02/27/13 07:12, Stefanie Tauber wrote:
>> Hi Valerie,
>>
>> thanks for the quick answer.
>> I have already checked how the strand is assigned for paired-end data.
>> After setting the strand for the left and right parts,
>> can I somehow rebuild the gappedalignmentPairs object?
>>
>> best,
>> Stefanie
>>
>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>>:
>>
>>> Hi Stephanie,
>>>
>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>> GappedAlignments.
>>>
>>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>
>>> You can extract the left/right parts with an accessor then look at
>>> strand, cigars, gaps etc.
>>>
>>> > left <- left(galp)
>>> > class(left)
>>> [1] "GappedAlignments"
>>> attr(,"package")
>>> [1] "GenomicRanges"
>>>
>>> > strand(left)
>>> factor-Rle of length 1572 with 665 runs
>>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5
>>> 1 51
>>> Values : - + - + - + - + - + - ... + - + - + - + -
>>> + -
>>> Levels(3): + - *
>>>
>>> > head(cigar(left))
>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>
>>> > head(ngap(left))
>>> [1] 0 0 0 0 0 0
>>>
>>> It's on these left/right pairs that you can reset the strand.
>>>
>>> > strand(left)[1:3]
>>> factor-Rle of length 3 with 1 run
>>> Lengths: 3
>>> Values : +
>>> Levels(3): + - *
>>> > strand(left)[1:3] <- "-"
>>> > strand(left)[1:3]
>>> factor-Rle of length 3 with 1 run
>>> Lengths: 3
>>> Values : -
>>> Levels(3): + - *
>>>
>>> Please be sure to read the details on the man page regarding how the
>>> strands were assigned.
>>>
>>> ?GappedAlignmentPairs
>>>
>>>
>>> Valerie
>>>
>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>> Dear list,
>>>>
>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>> Then I could read in the data with:
>>>>
>>>> library(GenomicRanges)
>>>>
>>>>> reads = readGappedAlignments("data.bam")
>>>> As I typically don't know which strand has been sequenced, it might
>>>> be necessary to flip the strand of the reads before doing any kind
>>>> of summarizeOverlap:
>>>>
>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>> of transcript database.
>>>>
>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>> I read the data:
>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>> it does not work as above.
>>>> I could imagine several work-arounds, I just wonder if there is any
>>>> straight approach which I miss at the moment..
>>>>
>>>>
>>>> best regards,
>>>> Stefanie
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> DI Stefanie Tauber
>>
>> Center for Integrative Bioinformatics Vienna (CIBIV)
>> (CIBIV is a joint institute of Vienna University and Medical University)
>> Max F. Perutz Laboratories (MFPL)
>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>> Dr. Bohr Gasse 9
>> A-1030 Wien, Austria
>> Phone: ++43 +1 / 42772-4030
>> Fax: ++43 +1 / 42772-4098
>> email: stefanie.tauber at univie.ac.at <mailto:stefanie.tauber at univie.ac.at>
>> www.cibiv.at <http://www.cibiv.at>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list