[Bioc-devel] readGAlignmentPairs with discordant strand
Michael Lawrence
lawrence.michael at gene.com
Sat Oct 17 21:01:27 CEST 2015
Cool, thanks. So does it make sense to have that as the coercion from
GAlignmentsList to GAlignmentPairs? I can add it, if it seems reasonable.
I'm kind of torn on whether a coercion should discard records.
On Sat, Oct 17, 2015 at 8:37 AM, Morgan, Martin <
Martin.Morgan at roswellpark.org> wrote:
> Val and I discussed this a bit, with the resolution that calling the
> GAlignmentPairs() constructor on the first and the second element of the
> GAlignmentsList was the 'easiest' way to go.
>
> ga = unlist(gal[mcols(gal)$mate_status == "mated"])
> GAlignmentPairs(ga[c(TRUE, FALSE)], ga[c(FALSE, TRUE)])
>
> (completely speculative code). If I understand correctly, there's a check
> in for discordant pairs in readGAlignmentPairs, but not in GAlignmentPairs
> itself; could be mistaken though...
>
> Martin
> ________________________________________
> From: Michael Lawrence [lawrence.michael at gene.com]
> Sent: Saturday, October 17, 2015 9:48 AM
> To: Hervé Pagès
> Cc: Michael Lawrence; Morgan, Martin; bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] readGAlignmentPairs with discordant strand
>
> This might have been lost in this thread. If there is an easy way to
> coerce a GAlignmentsList where all(lengths(x) == 2) to a GAlignmentPairs,
> please let me know. Otherwise, I'll add a coerce method. Debating whether
> it should discard elements of length != 2, or if it should fail.
>
> On Fri, Oct 16, 2015 at 9:58 AM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
> Sure, "*" makes more sense for strand, given the precedent.
>
> On Fri, Oct 16, 2015 at 9:55 AM, Hervé Pagès <hpages at fredhutch.org<mailto:
> hpages at fredhutch.org>> wrote:
> > On 10/16/2015 09:28 AM, Michael Lawrence wrote:
> >>
> >> I kind of wish it would return NA for things like seqnames and strand,
> >> but yes that would be very useful.
> >
> >
> > Could do this for seqnames() but I'm hesitant to do this for strand().
> > If you look at ?strand in BiocGenerics, ‘*’ is used when the exact
> > strand of the location is unknown, or irrelevant, or when the "feature"
> > at that location belongs to both strands. A pair with discordant strand
> > belongs to both strands. Also there is a lot of code around that
> > assumes strand() never returns NAs.
> >
> >
> > H.
> >
> >>
> >> On Fri, Oct 16, 2015 at 9:15 AM, Hervé Pagès <hpages at fredhutch.org
> <mailto:hpages at fredhutch.org>> wrote:
> >>>
> >>> Hi Michael, Martin,
> >>>
> >>> On 10/16/2015 06:48 AM, Michael Lawrence wrote:
> >>>>
> >>>>
> >>>> It does seem like starting with the more general data structure is the
> >>>> better approach, but I couldn't find an easy way to move the paired
> >>>> subset
> >>>> of GAlignmentsList to GAlignmentPairs. You mention a coercion, but
> it's
> >>>> not
> >>>> obvious to me, unfortunately.
> >>>>
> >>>> Another approach would be a GAlignmentPairs where the unpaired reads
> >>>> have
> >>>> "missing" mates. I know GAlignments has no concept of missing, but it
> >>>> would
> >>>> get everything into a single data structure that is convenient for
> >>>> computing on pairs.
> >>>
> >>>
> >>>
> >>> I could modify readGAlignmentPairs() to have the discordant and/or
> >>> ambiguous pairs end up in th GAlignmentPairs. The ambiguous pairs
> >>> could be marked as such thru a metadata col of the object or thru
> >>> a proper slot. The seqnames() and strand() accessors will return
> >>> * on discordant pairs. Does that sound reasonable?
> >>>
> >>> H.
> >>>
> >>>
> >>>>
> >>>> On Fri, Oct 16, 2015 at 5:21 AM, Morgan, Martin <
> >>>> Martin.Morgan at roswellpark.org<mailto:Martin.Morgan at roswellpark.org>>
> wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org<mailto:
> bioc-devel-bounces at r-project.org>] On Behalf
> >>>>>> Of
> >>>>>> Michael Lawrence
> >>>>>> Sent: Friday, October 16, 2015 7:41 AM
> >>>>>> To: bioc-devel at r-project.org<mailto:bioc-devel at r-project.org>
> >>>>>> Subject: [Bioc-devel] readGAlignmentPairs with discordant strand
> >>>>>>
> >>>>>> Now that GAlignmentPairs supports discordant strand between mates,
> how
> >>>>>> hard would it be to relax that restriction on readGAlignmentPairs()?
> >>>>>>
> >>>>>> Also, would be nice if getDumpedAlignments() returned those dumped
> by
> >>>>>> readGAlignmentPairs(). Right now, I'm reading a GAlignments (with
> the
> >>>>>
> >>>>>
> >>>>> extra
> >>>>>>
> >>>>>>
> >>>>>> mcols) and calling makeGAlignmentPairs(). Not so convenient.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I'm not sure whether this is relevant to your use case but
> >>>>> readGAlignmentsList returns a list of paired mates, and if
> appropriate
> >>>>> (based on ScanBamParam) list elements with solo travelers. The paired
> >>>>> portion of the list can be coerced to GAlignmentPairs if the
> additional
> >>>>> structure of that class is required.
> >>>>>
> >>>>> Martin
> >>>>>
> >>>>>>
> >>>>>> Michael
> >>>>>>
> >>>>>> [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing
> list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> This email message may contain legally privileged and/or confidential
> >>>>> information. If you are not the intended recipient(s), or the
> employee
> >>>>> or
> >>>>> agent responsible for the delivery of this message to the intended
> >>>>> recipient(s), you are hereby notified that any disclosure, copying,
> >>>>> distribution, or use of this email message is prohibited. If you
> have
> >>>>> received this message in error, please notify the sender immediately
> by
> >>>>> e-mail and delete this email message from your computer. Thank you.
> >>>>>
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing
> list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>
> >>> --
> >>> Hervé Pagès
> >>>
> >>> Program in Computational Biology
> >>> Division of Public Health Sciences
> >>> Fred Hutchinson Cancer Research Center
> >>> 1100 Fairview Ave. N, M1-B514
> >>> P.O. Box 19024
> >>> Seattle, WA 98109-1024
> >>>
> >>> E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
> >>> Phone: (206) 667-5791<tel:%28206%29%20667-5791>
> >>> Fax: (206) 667-1319<tel:%28206%29%20667-1319>
> >
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fredhutch.org<mailto:hpages at fredhutch.org>
> > Phone: (206) 667-5791<tel:%28206%29%20667-5791>
> > Fax: (206) 667-1319<tel:%28206%29%20667-1319>
>
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list