[Bioc-devel] deprecation of keepSeqlevels and renameSeqlevels

Hervé Pagès hpages at fhcrc.org
Wed May 22 20:15:51 CEST 2013


On 05/21/2013 06:37 PM, Michael Lawrence wrote:
> It's clear we're not going to agree. I will volunteer to maintain these,
> OK? I will maintain them anyway for my own use.

That's fine with me. I didn't mean we shouldn't have those wrappers (and
I think Martin or Val are going to come back here with a proposal).
I just wanted to make it clear that the seqlevels setter is not that
hideous thing

   seqlevels(x, new2old = match("chr1", seqlevels(x)), force = TRUE) <- 
"chr1"

Sorry if I kept beating again and again the same dead horse...

Cheers,
H.

>
> Michael
>
>
> On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>
>
>     On 05/21/2013 08:00 AM, Michael Lawrence wrote:
>
>
>
>
>         On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>              Michael,
>
>
>              On 05/20/2013 08:44 PM, Michael Lawrence wrote:
>
>
>
>
>                  On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès
>         <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>> wrote:
>
>                       Hi,
>
>
>                       On 05/20/2013 03:15 PM, Michael Lawrence wrote:
>
>                           On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
>                           <mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org> <mailto:mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org>>
>                  <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
>         <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>>> wrote:
>
>                               Hi Michael --
>
>
>                               On 5/20/2013 5:56 AM, Michael Lawrence wrote:
>
>                                   While it's cool that seqlevels<- has
>         become so
>                  flexible,
>                                   I still claim
>                                   that
>                                   rename/keep/drop would be a lot easier
>         to read,
>                  because
>                                   they describe the
>                                   high-level operation, and there's no
>         reason to
>                  deprecate
>                                   them. They're
>                                   also
>                                   easier to remember. For example, for
>         dropping with
>                                   seqlevels<-, the user
>                                   needs
>                                   to remember that "force" is necessary
>         to drop. Much
>                                   easier to just say
>                                   "dropSeqlevels(), please". And
>         reimplementing
>                                   keepSeqlevels is still too
>                                   complicated. Is it such a maintenance
>         burden to
>                  have
>                                   these simple
>                                   wrappers sit
>                                   on top of the low-level manipulators?
>
>                                   Another suggestion: renameSeqlevels
>         should not
>                  require a
>                                   named vector (in
>                                   cases
>                                   where it is unnamed, require that it is
>                  parallel to the
>                                   existing
>                                   seqlevels, as
>                                   with seqlevels<-).
>
>
>                               I didn't really indicate what drove my
>         desire to see
>                               keepSeqlevels
>                               deprecated. keepSeqlevels, seqlevels<-, and
>                  isActiveSeq were
>                               developed more
>                               or less independently, and have different
>                  contracts. The
>                               different
>                               contracts are confusing to the user, as is
>         creating
>                  a usable
>                               help system
>                               when there are multiple end points for
>         what is a common
>                               operation. The help
>                               pages of each were inadequate in different
>         ways.
>                  And because
>                               the code was
>                               developed independently, support for different
>                  objects was
>                               inconsistent. So
>                               actually, yes, the maintenance (and use)
>         burden for the
>                               previous state of
>                               affairs was substantial.
>
>                               On the other hand, I agree that
>         keepSeqlevels is
>                  convenient
>                               as a simple
>                               wrapper around seqlevels<-, in the same
>         way that
>                  setNames
>                               and names<- are
>                               both useful.
>
>                               So we could iterate to
>
>                                   keepSeqlevels <-
>                                       function(x, value, ...)
>                                   {
>                                       seqlevels(x, force=TRUE) <- value
>                                       x
>                                   }
>
>                               but I'd be less enthusiastic about
>         maintaining the
>                  original
>                               contract of
>                               keepSeqlevels, where keepSeqlevels(gr1,
>         gr2), would
>                  have
>                               worked for two
>                               GRanges objects.
>
>
>                       Why would this be called keepSeqlevels() if,
>         depending on
>                  how it's used,
>                       it will either add, drop, rename, or permute the
>         seqlevels?
>                       Couldn't this be called setSeqlevels?
>
>
>                  I thought that new2old was necessary for permuting.
>
>
>              The seqlevels setter has no 'new2old' arg.
>
>
>                  You're right that
>                  adding should be disallowed for keepSeqlevels(). Adding
>                  seqlevels is not
>                  a common operation. The two common operations are:
>                  * Permuting, usually because the data were imported in
>         non-canonical
>                  order (seqnameStyle was designed to address this, no?).
>
>
>              Permuting is straightforward with seqlevels<-:
>
>                 > seqlevels(gr)
>                 [1] "chr1" "chr2" "chr3"
>
>                 > seqlevels(gr) <- rev(seqlevels(gr))
>
>                 > seqlevels(gr)
>                 [1] "chr3" "chr2" "chr1"
>
>
>                  * Subsetting, either by keeping or dropping.
>
>
>              Also straightforward:
>
>                 > seqlevels(gr, force=TRUE) <- "chr2"
>
>                 > seqlevels(gr)
>                 [1] "chr2"
>
>
>                  Two main reasons: taking a
>                  small slice of the data for prototyping, or removing
>         problematic
>                  chromosomes (sex, circular). This goes back to bsapply
>         and the
>                  exclude
>                  argument.
>
>
>              There are other important use cases:
>
>                 - Dropping seqlevels that are *not* in use. This happens
>         for example
>                   when subsetting a BAM file with Rsamtools::filterBam
>         to keep only
>                   the alignments located on a given chromosome. The
>         entire sequence
>                   dictionary (in the header) is propagated to the
>         resulting BAM file
>                   so when you read the file with readGAlignments() you
>         end up with a
>                   bunch of useless seqlevels that you generally start by
>         dropping.
>                   You don't want to drop any alignment so force=FALSE is
>         your friend.
>
>
>         This is sort of tangential to the discussion, but do you really
>         want to
>         do this? I would preserve the universe as given by the BAM.
>
>                 - An even more common operation is renaming: 90% of the
>         times that's
>                   what I use seqlevels<- for, and that's what I tell
>         people to use
>                   when they need to rename their chromosomes. Also
>         straightforward:
>
>                     > seqlevels(gr)
>                     [1] "chr1" "chr2" "chr3" "chrM"
>
>                     > seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
>                     > seqlevels(gr)
>                     [1] "1" "2" "3" "M"
>
>                     > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
>                     > seqlevels(gr)
>                     [1] "1"  "2"  "3"  "MT"
>
>                   Note that this is simpler to use than
>         renameSeqlevels() which
>                   always required you to build the entire right value as
>         a named
>                   vector.
>
>
>         Sure, renaming is a common use case. Not sure how I forgot that.
>
>         As I wrote earlier in the thread, renameSeqlevels should be
>         changed so
>         as not to require naming of the vector.
>
>              So the added-value of keepSeqlevels() seems to boil down to:
>
>                 (a) it always uses force=TRUE
>
>                 (b) it preserves the original object and returns the
>         modified object
>
>              If you want to restrict the use of keepSeqlevels() to
>         permuting and
>              dropping, you'll also have to disallow renaming (in
>         addition to adding).
>              Then its name will more or less reflect what it does (if
>         "keep" means
>              "permute" or "drop"). The final result will be something
>         that does less
>              than setSeqlevels() and that is not necessarily easier to
>         read: both
>              will set on the object whatever vector of seqlevels they
>         are supplied,
>              except that one will fail when doing so would actually
>         result in adding
>              or renaming seqlevels.
>
>
>         So it looks like seqlevels<- is pretty powerful. Now I see that if I
>         scroll down to the examples in the man page, I find the actual
>         documentation. It's cool to learn that we can replace seqlevels
>         on TxDb
>         objects.
>
>
>     Has been working on TranscriptDb objects for more than
>
>
>         My argument has always been that since these low-level
>         operations are so powerful, it's nice to have high-level
>         operations to
>         clarify the code. We have to think hard to know what the RHS
>         will do in
>         that replacement.
>
>
>     When you do something like seqlevels(gr) <- new_seqlevels, you're saying
>     "can you please replace the current seqlevels by those new ones".
>     That's all. The request/contract is simple and all the complexity is
>     taken care of internally by seqlevels<-. It might be that, depending on
>     what's currently on 'gr', this will actually result in dropping, adding,
>     permuting. Or *a combination of the 3*. Like here:
>
>
>        > seqlevels(gr)
>        [1] "chr1" "chr2" "chr3"
>
>        > seqlevels(gr) <- c("chr4", "chr3", "chr1")
>        > seqlevels(gr)
>        [1] "chr4" "chr3" "chr1"
>
>        # one seqlevel was drop, one was added, and the 2 seqlevels that were
>        # preserved were reordered.
>
>     When you want to subset a vector to keep all elements with an even
>     index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements?
>     Do we need to have a subsetting operator for keeping elements, another
>     one for dropping elements, and another one for permuting them (like
>     in x[sample(length(x))])?
>
>     H.
>
>         It's probably one of the most complex replacement
>         operations; far more complex than the typical one, including
>         levels<- on
>         factor. There's nothing wrong with that; it's just complexity
>         that we
>         should be able to hide.
>
>         Michael
>
>              H.
>
>
>                  Michael
>
>
>                       H.
>
>
>
>                           Well, I wasn't even aware of that feature, so
>         you've
>                  made your
>                           point about
>                           the documentation ;) Sounds like a good
>         compromise to me.
>
>                           Thanks for understanding,
>                           Michael
>
>
>
>                               Martin
>
>
>                                   Michael
>
>
>
>
>
>
>                                   On Sat, May 18, 2013 at 6:00 PM,
>         Martin Morgan
>                                   <mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org> <mailto:mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org>>
>                  <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
>         <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>>
>                                   <mailto:mtmorgan at fhcrc.org
>         <mailto:mtmorgan at fhcrc.org>
>                  <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
>         <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>
>                  <mailto:mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>>>>
>
>                                   wrote:
>
>                                         On 05/18/2013 05:42 PM, Martin
>         Morgan wrote:
>
>                                             Some of the most common
>         operations are
>                                   straight-forward, e.g.,
>
>                                                   > gr =
>         GRanges(paste0("chr",
>                  c(1:22,
>                                   "X", "Y")), IRanges(1,
>                                   100))
>                                                   > seqlevels(gr) =
>         sub("chr", "ch",
>                                   seqlevels(gr))
>
>
>                                         To get a more comprehensive
>         example I
>                  should have
>                                   followed my own
>                                   advice and
>                                         grabbed from the help page!
>
>                                               ## Rename:
>                                               seqlevels(txdb) <-
>         sub("chr", "",
>                                   seqlevels(txdb))
>                                               seqlevels(txdb)
>
>                                               seqlevels(txdb) <-
>         paste0("CH",
>                  seqlevels(txdb))
>                                               seqlevels(txdb)
>
>
>                  seqlevels(txdb)[seqlevels(__**______txdb) ==
>
>
>                                   "CHM"] <- "M"
>
>
>                                               seqlevels(txdb)
>
>
>                                         --
>                                         Computational Biology / Fred
>         Hutchinson
>                  Cancer
>                                   Research Center
>                                         1100 Fairview Ave. N.
>                                         PO Box 19024 Seattle, WA 98109
>
>                                         Location: Arnold Building M1 B861
>                                         Phone: (206) 667-2793
>         <tel:%28206%29%20667-2793>
>                  <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793>
>                                   <tel:%28206%29%20667-2793>
>
>
>
>
>                               --
>                               Dr. Martin Morgan, PhD
>
>                               Fred Hutchinson Cancer Research Center
>                               1100 Fairview Ave. N.
>                               PO Box 19024 Seattle, WA 98109
>
>
>                                    [[alternative HTML version deleted]]
>
>
>           _____________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                  <mailto:Bioc-devel at r-project.
>         <mailto:Bioc-devel at r-project.>____org
>                  <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>>
>                           mailing list
>         https://stat.ethz.ch/mailman/______listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>
>
>           <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
>                       --
>                       Hervé Pagès
>
>                       Program in Computational Biology
>                       Division of Public Health Sciences
>
>                       Fred Hutchinson Cancer Research Center
>                       1100 Fairview Ave. N, M1-B514
>                       P.O. Box 19024
>                       Seattle, WA 98109-1024
>
>                       E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                       Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>                  <tel:%28206%29%20667-5791>
>                       Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>                  <tel:%28206%29%20667-1319>
>
>
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>
>              E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list