[BioC] makeGappedAlignmentPairs for broken pairs

Valerie Obenchain vobencha at fhcrc.org
Fri Jan 17 16:26:58 CET 2014


Hi,

One clarification about what I mistook as a 'bug' in readGAlignmentPairs 
with the 'hasUnmappedMate' flag. I don't think this a bug at all but 
instead intended behavior. The GAlignmentPairs class can only contain 
pairs in a 'left', 'right' fashion. It can't hold one read without a 
mate. The GAlignmentsList class was developed to meet this need and can 
hold paired reads as well as single fragments or groups (more on that 
below).


On 01/17/14 01:50, Ido Tamir wrote:
> Dear Valerie,
> thank you very much for your reply. I can't test the development packages at the moment, I'm still on 2.12.

readGAlignmentsList() is in the GenomicRanges package in the current 
release branch (Bioconductor 2.13, released October 2013). If you update 
to the current release you can read in both properly-paired and unmapped 
reads with readGAlignmentsList().

For help installing:

http://www.bioconductor.org/install/


>
>> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
> a)
> Will this load only reads with unmapped mates or will this allow both proper paired and unpaired which is what I really want.
> reading the files in twice (once for proper pairs, once for single) would be possible, but I prefer reading in once.

That particular param will only read in records with unmapped mates. The 
?ScanBamParam man page lists all the flags and describes them in detail. 
You can create a query with (almost) any combination of flags. To read 
in all records, do not supply a param.

readGAlignmentsList(fl)

To read in only proper pairs use the 'isProperPair' flag.

In release the output of readGAlignmentsList() differs from devel. In 
release the mate status is a elementMetadata column on the individual 
GAlignments inside the GAlignmentsList object.


> b)
> will findMateAlignment be extended to allow these unmapped mate reads to be represented in a GAlignmentPairs class?

findMateAlignment() is going away. readGAlignmentPairs() will stay. 
readGAlignmentPairs() and readGAlignmentsList() call the same code in 
the background but return a different object.

I don't think the GAlignmentPairs container will be modified to hold one 
read without its mate. The GAlignmentsList class was developed to be a 
more flexible container that could hold single fragments, paired reads 
and groups of multiple fragments. If you are after both paired and 
single reads (e.g., unmapped reads) you should try readGAlignmentsList().


> c)
> Do you know when this (2.14) will be released?

The next release will be in spring around March or April. Closer to the 
date the schedule will be posted here:

http://www.bioconductor.org/developers/release-schedule/

Valerie
>
> thank you very much,
> ido
>
>
> On Jan 16, 2014, at 9:09 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>
>> Hi Ido,
>>
>> library(GenomicAlignments)
>> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>
>> You can set different flags in ScanBamParam to control what is read in.
>> See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates.
>>
>> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE))
>>
>> Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects.
>>
>> ## GAlignmentPairs:
>>
>> Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that.
>>> gapairs <- readGAlignmentPairs(fl, param=param)
>> Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) :
>>   BAM flag filters to combine are incompatible
>>
>> ## GAlignmentsList:
>>
>> galist <- readGAlignmentsList(fl, param=param)
>>>> galist
>>> GAlignmentsList of length 127:
>>> [[1]]
>>> GAlignments with 1 alignment and 0 metadata columns:
>>>       seqnames strand cigar qwidth start  end width ngap
>>>   [1]     seq2      +   35M     35  1520 1554    35    0
>>>
>>> [[2]]
>>> GAlignments with 1 alignment and 0 metadata columns:
>>>       seqnames strand cigar qwidth start end width ngap
>>>   [1]     seq1      +   36M     36     6  41    36    0
>>
>> There is a metadata column on the list that has the mate status. In this case they are all 'unmated'.
>>> mcols(galist)
>> DataFrame with 127 rows and 1 column
>>     mate_status
>>        <factor>
>> 1       unmated
>> 2       unmated
>> 3       unmated
>> 4       unmated
>> 5       unmated
>> ...         ...
>>
>> This code is available in the devel branch.
>>
>> Valerie
>>
>>
>>> sessionInfo()
>> R Under development (unstable) (2014-01-06 r64682)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> ...
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28
>> [3] Rsamtools_1.15.18         Biostrings_2.31.7
>> [5] GenomicRanges_1.15.19     XVector_0.3.6
>> [7] IRanges_1.21.19           BiocGenerics_0.9.2
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.25.9   Biobase_2.23.3         biomaRt_2.19.1
>> [4] bitops_1.0-6           BSgenome_1.31.7        DBI_0.2-7
>> [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1         RSQLite_0.11.4
>> [10] rtracklayer_1.23.6     stats4_3.1.0           tools_3.1.0
>> [13] XML_3.98-1.1           zlibbioc_1.9.0
>>
>>
>>
>>
>> On 01/16/2014 02:57 AM, Ido Tamir wrote:
>>> Hi,
>>> is there a roadmap for allowing paired end reads where only one is mapped to be reported?
>>> Basically I would like to have readGappedAlignmentPairs report them also.
>>> One could reread the whole file and take only those reads where the  makeGappedAlignmentPairs test
>>> was unsuccessfull.
>>>
>>> thank you very much,
>>> ido
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>> --
>> Valerie Obenchain
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B155
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: vobencha at fhcrc.org
>> Phone:  (206) 667-3158
>> Fax:    (206) 667-1319
>



More information about the Bioconductor mailing list