[Bioc-devel] BamTallyParam argument 'which'

Leonard Goldstein goldstein.leonard at gene.com
Mon Feb 23 20:38:53 CET 2015


Sounds very sensible not to double count in the context of tallying
variants. I was more concerned with reducing which as the default
behavior for scanBam and other functions.

I wanted to bring up the samtools behavior as - for me at least -
inconsistencies between Rsamtools and samtools have been another
source of confusion in the past (e.g. different naming conventions for
fields like isize vs TLEN etc.)

Leonard


On Mon, Feb 23, 2015 at 11:22 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Maybe Rsamtools would want to follow this precedent. I think there might be
> a difference between fishing out alignments from a SAM/BAM, and deriving a
> summary (tallyVariants) from a BAM. It seems like an argument could be made
> for a tally set to not contain duplicates.
>
> On Mon, Feb 23, 2015 at 11:05 AM, Leonard Goldstein
> <goldstein.leonard at gene.com> wrote:
>>
>> Hi Michael and Thomas,
>>
>> I ran into the same problem in the past (i.e. when I started working
>> with functions like scanBam I expected them not to return the same
>> alignment multiple times)
>>
>> One thing to consider might be that returning alignments multiple
>> times is consistent with the behavior of the samtools view command.
>> Quoting from the samtools manual:
>>
>> “Important note: when multiple regions are given, some alignments may
>> be output multiple times if they overlap more than one of the
>> specified regions.”
>>
>> Maybe there is an argument for keeping things consistent with
>> samtools? As you said, if documented properly, the user can decide
>> whether to reduce regions specified in which or not.
>>
>> Leonard
>>
>>
>> On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > We should at leaast try to avoid surprising the user. Seems like most
>> > people expect "which" to be a simple restriction, so I think for now I
>> > will
>> > just reduce the which, and if someone has a use case for separate
>> > queries,
>> > we can address it in the future.
>> >
>> > On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann
>> > <sandmann.thomas at gene.com>
>> > wrote:
>> >
>> >> Personally, I don't have a use case with "meaningful loci" worth
>> >> tracking,
>> >> so keeping it simple would work for me.
>> >>
>> >> Incidentally, would it be good to deal with the 'which' parameter in a
>> >> consistent way across different methods ? I just saw this recent post
>> >> on
>> >> the mailing list in which a used got confused by duplicate counts
>> >> returned
>> >> after passing 'which' to scanBamParam:
>> >>
>> >> https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html
>> >>
>> >>
>> >> ---
>> >>
>> >> Thomas Sandmann, PhD
>> >> Computational biologist
>> >>
>> >> Genentech, Inc.
>> >> 1 DNA Way
>> >> South San Francisco, CA 94080
>> >> USA
>> >>
>> >> Phone: +1 650 225 6273
>> >> Fax: +1 650 225 5389
>> >> Email: sandmann.thomas at gene.com
>> >>
>> >> "If a man will begin with certainties, he shall end in doubts; but if
>> >> he
>> >> will be content to begin with doubts he shall end in certainties." --
>> >> Sir
>> >> Francis Bacon
>> >>
>> >>
>> >> On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence <
>> >> lawrence.michael at gene.com> wrote:
>> >>
>> >>> We just have to decide which is the more useful interpretation of
>> >>> which
>> >>> -- as a simple restriction, or as a vector of meaningful locii, which
>> >>> will
>> >>> be analyzed individually? I would actually favor the first one (the
>> >>> same as
>> >>> yours), just because it's simpler. To keep track of the query ranges,
>> >>> we
>> >>> would need to add a new column to the returned object, which will more
>> >>> often than not just be clutter. I guess we could introduce a new
>> >>> parameter,
>> >>> "reduceWhich" which defaults to TRUE and reduces the which. If FALSE,
>> >>> it
>> >>> instead adds the column mapping back to the original which ranges.
>> >>>
>> >>>
>> >>> On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann <
>> >>> sandmann.thomas at gene.com> wrote:
>> >>>
>> >>>> Hi Michael,
>> >>>>
>> >>>> ah, I see. I hadn't realized that returning the pileups separately
>> >>>> for
>> >>>> each region could be a desired feature, but that makes sense. I
>> >>>> agree, as
>> >>>> it is easy for the user to 'reduce' the ranges beforehand your first
>> >>>> option
>> >>>> (e.g. returning the ID of the range) would be more flexible.
>> >>>>
>> >>>> Perhaps you would consider adding a sentence to the documentation of
>> >>>> 'which' on BamTallyParam's help page explaining that users might want
>> >>>> to
>> >>>> 'reduce' their ranges beforehand if they are only interested in a
>> >>>> single
>> >>>> tally for each base ?
>> >>>>
>> >>>> Thanks a lot !
>> >>>> Thomas
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioc-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>



More information about the Bioc-devel mailing list