[Bioc-devel] BamTallyParam argument 'which'

Michael Lawrence lawrence.michael at gene.com
Mon Feb 23 20:22:29 CET 2015


Maybe Rsamtools would want to follow this precedent. I think there might be
a difference between fishing out alignments from a SAM/BAM, and deriving a
summary (tallyVariants) from a BAM. It seems like an argument could be made
for a tally set to not contain duplicates.

On Mon, Feb 23, 2015 at 11:05 AM, Leonard Goldstein <
goldstein.leonard at gene.com> wrote:

> Hi Michael and Thomas,
>
> I ran into the same problem in the past (i.e. when I started working
> with functions like scanBam I expected them not to return the same
> alignment multiple times)
>
> One thing to consider might be that returning alignments multiple
> times is consistent with the behavior of the samtools view command.
> Quoting from the samtools manual:
>
> “Important note: when multiple regions are given, some alignments may
> be output multiple times if they overlap more than one of the
> specified regions.”
>
> Maybe there is an argument for keeping things consistent with
> samtools? As you said, if documented properly, the user can decide
> whether to reduce regions specified in which or not.
>
> Leonard
>
>
> On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
> > We should at leaast try to avoid surprising the user. Seems like most
> > people expect "which" to be a simple restriction, so I think for now I
> will
> > just reduce the which, and if someone has a use case for separate
> queries,
> > we can address it in the future.
> >
> > On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann <
> sandmann.thomas at gene.com>
> > wrote:
> >
> >> Personally, I don't have a use case with "meaningful loci" worth
> tracking,
> >> so keeping it simple would work for me.
> >>
> >> Incidentally, would it be good to deal with the 'which' parameter in a
> >> consistent way across different methods ? I just saw this recent post on
> >> the mailing list in which a used got confused by duplicate counts
> returned
> >> after passing 'which' to scanBamParam:
> >>
> >> https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html
> >>
> >>
> >> ---
> >>
> >> Thomas Sandmann, PhD
> >> Computational biologist
> >>
> >> Genentech, Inc.
> >> 1 DNA Way
> >> South San Francisco, CA 94080
> >> USA
> >>
> >> Phone: +1 650 225 6273
> >> Fax: +1 650 225 5389
> >> Email: sandmann.thomas at gene.com
> >>
> >> "If a man will begin with certainties, he shall end in doubts; but if he
> >> will be content to begin with doubts he shall end in certainties." --
> Sir
> >> Francis Bacon
> >>
> >>
> >> On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence <
> >> lawrence.michael at gene.com> wrote:
> >>
> >>> We just have to decide which is the more useful interpretation of which
> >>> -- as a simple restriction, or as a vector of meaningful locii, which
> will
> >>> be analyzed individually? I would actually favor the first one (the
> same as
> >>> yours), just because it's simpler. To keep track of the query ranges,
> we
> >>> would need to add a new column to the returned object, which will more
> >>> often than not just be clutter. I guess we could introduce a new
> parameter,
> >>> "reduceWhich" which defaults to TRUE and reduces the which. If FALSE,
> it
> >>> instead adds the column mapping back to the original which ranges.
> >>>
> >>>
> >>> On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann <
> >>> sandmann.thomas at gene.com> wrote:
> >>>
> >>>> Hi Michael,
> >>>>
> >>>> ah, I see. I hadn't realized that returning the pileups separately for
> >>>> each region could be a desired feature, but that makes sense. I
> agree, as
> >>>> it is easy for the user to 'reduce' the ranges beforehand your first
> option
> >>>> (e.g. returning the ID of the range) would be more flexible.
> >>>>
> >>>> Perhaps you would consider adding a sentence to the documentation of
> >>>> 'which' on BamTallyParam's help page explaining that users might want
> to
> >>>> 'reduce' their ranges beforehand if they are only interested in a
> single
> >>>> tally for each base ?
> >>>>
> >>>> Thanks a lot !
> >>>> Thomas
> >>>>
> >>>>
> >>>
> >>
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list