[BioC] makeGappedAlignmentPairs for broken pairs
Valerie Obenchain
vobencha at fhcrc.org
Fri Jan 17 16:26:58 CET 2014
Hi,
One clarification about what I mistook as a 'bug' in readGAlignmentPairs
with the 'hasUnmappedMate' flag. I don't think this a bug at all but
instead intended behavior. The GAlignmentPairs class can only contain
pairs in a 'left', 'right' fashion. It can't hold one read without a
mate. The GAlignmentsList class was developed to meet this need and can
hold paired reads as well as single fragments or groups (more on that
below).
On 01/17/14 01:50, Ido Tamir wrote:
> Dear Valerie,
> thank you very much for your reply. I can't test the development packages at the moment, I'm still on 2.12.
readGAlignmentsList() is in the GenomicRanges package in the current
release branch (Bioconductor 2.13, released October 2013). If you update
to the current release you can read in both properly-paired and unmapped
reads with readGAlignmentsList().
For help installing:
http://www.bioconductor.org/install/
>
>> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
> a)
> Will this load only reads with unmapped mates or will this allow both proper paired and unpaired which is what I really want.
> reading the files in twice (once for proper pairs, once for single) would be possible, but I prefer reading in once.
That particular param will only read in records with unmapped mates. The
?ScanBamParam man page lists all the flags and describes them in detail.
You can create a query with (almost) any combination of flags. To read
in all records, do not supply a param.
readGAlignmentsList(fl)
To read in only proper pairs use the 'isProperPair' flag.
In release the output of readGAlignmentsList() differs from devel. In
release the mate status is a elementMetadata column on the individual
GAlignments inside the GAlignmentsList object.
> b)
> will findMateAlignment be extended to allow these unmapped mate reads to be represented in a GAlignmentPairs class?
findMateAlignment() is going away. readGAlignmentPairs() will stay.
readGAlignmentPairs() and readGAlignmentsList() call the same code in
the background but return a different object.
I don't think the GAlignmentPairs container will be modified to hold one
read without its mate. The GAlignmentsList class was developed to be a
more flexible container that could hold single fragments, paired reads
and groups of multiple fragments. If you are after both paired and
single reads (e.g., unmapped reads) you should try readGAlignmentsList().
> c)
> Do you know when this (2.14) will be released?
The next release will be in spring around March or April. Closer to the
date the schedule will be posted here:
http://www.bioconductor.org/developers/release-schedule/
Valerie
>
> thank you very much,
> ido
>
>
> On Jan 16, 2014, at 9:09 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>
>> Hi Ido,
>>
>> library(GenomicAlignments)
>> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>
>> You can set different flags in ScanBamParam to control what is read in.
>> See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates.
>>
>> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
>>
>> Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects.
>>
>> ## GAlignmentPairs:
>>
>> Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that.
>>> gapairs <- readGAlignmentPairs(fl, param=param)
>> Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) :
>> BAM flag filters to combine are incompatible
>>
>> ## GAlignmentsList:
>>
>> galist <- readGAlignmentsList(fl, param=param)
>>>> galist
>>> GAlignmentsList of length 127:
>>> [[1]]
>>> GAlignments with 1 alignment and 0 metadata columns:
>>> seqnames strand cigar qwidth start end width ngap
>>> [1] seq2 + 35M 35 1520 1554 35 0
>>>
>>> [[2]]
>>> GAlignments with 1 alignment and 0 metadata columns:
>>> seqnames strand cigar qwidth start end width ngap
>>> [1] seq1 + 36M 36 6 41 36 0
>>
>> There is a metadata column on the list that has the mate status. In this case they are all 'unmated'.
>>> mcols(galist)
>> DataFrame with 127 rows and 1 column
>> mate_status
>> <factor>
>> 1 unmated
>> 2 unmated
>> 3 unmated
>> 4 unmated
>> 5 unmated
>> ... ...
>>
>> This code is available in the devel branch.
>>
>> Valerie
>>
>>
>>> sessionInfo()
>> R Under development (unstable) (2014-01-06 r64682)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> ...
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28
>> [3] Rsamtools_1.15.18 Biostrings_2.31.7
>> [5] GenomicRanges_1.15.19 XVector_0.3.6
>> [7] IRanges_1.21.19 BiocGenerics_0.9.2
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.25.9 Biobase_2.23.3 biomaRt_2.19.1
>> [4] bitops_1.0-6 BSgenome_1.31.7 DBI_0.2-7
>> [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1 RSQLite_0.11.4
>> [10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0
>> [13] XML_3.98-1.1 zlibbioc_1.9.0
>>
>>
>>
>>
>> On 01/16/2014 02:57 AM, Ido Tamir wrote:
>>> Hi,
>>> is there a roadmap for allowing paired end reads where only one is mapped to be reported?
>>> Basically I would like to have readGappedAlignmentPairs report them also.
>>> One could reread the whole file and take only those reads where the makeGappedAlignmentPairs test
>>> was unsuccessfull.
>>>
>>> thank you very much,
>>> ido
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>> --
>> Valerie Obenchain
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B155
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: vobencha at fhcrc.org
>> Phone: (206) 667-3158
>> Fax: (206) 667-1319
>
More information about the Bioconductor
mailing list