[Bioc-devel] isActiveSeq deprecated

Hahne, Florian florian.hahne at novartis.com
Tue Oct 1 09:18:13 CEST 2013


Same view here, 
with all the different data types that are around in the Bioconductor
world these days it seems to me that a consistent behaviour is preferable.
Florian

On 9/24/13 8:22 AM, "Hervé Pagès" <hpages at fhcrc.org> wrote:

>Hi Florian, Marc,
>
>On 09/18/2013 11:55 PM, Hahne, Florian wrote:
>> Hi Marc, Herve,
>> I also noticed this behaviour:
>>
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>> seqlevels(txdb, force=TRUE) <- c("chr1", "ch2")
>> oldLevs <- seqlevels(txdb)
>> seqlevels(txdb, force=TRUE) <- "chr1"
>>
>> seqlevels(txdb, force=TRUE) <- oldLevs
>> Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force =
>>force,
>>   :
>>    The replacement value must be either a 1 to 1 replacement or a
>>subset of
>> the original set when replacing the 'seqinfo' of a TranscriptDb object
>>
>>
>> But:
>>
>> restoreSeqlevels(txdb)
>> seqlevels(txdb, force=TRUE) <- oldLevs
>>
>>
>> This is probably intentional, but I found it to be rather confusing. If
>> one should be able to use the seqlevels replacement method to control
>> active and inactive chromosomes in the TranscriptDb object, then having
>> this mandatory step of restoring all chromosomes to the active state
>> followed by another restriction to be quite cumbersome. At least a
>> somewhat more useful error message would help in that case. Or couldn't
>> the replacement method always call restoreSeqlevels internally.
>
>Somewhat related to this, I also found the following issue:
>
>   > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>   > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>   > seqlevels(txdb, force=TRUE) <- "chrM"
>   > seqlengths(txdb)
>    chrM
>   16571
>   > seqlevels(txdb) <- "chr1"
>   > seqlengths(txdb)
>    chr1
>   16571
>
>The 2nd call to the seqlevels() setter actually performed a
>*renaming* operation (a clue for this is that I didn't have to
>use force=TRUE). Renaming the seqlevels of a TranscriptDb object
>is something I think we want to support, e.g. when there is the
>need to use a different naming convention (like in
>seqlevels(txdb) <- "M"). This is exactly the syntax that one
>would use to rename the seqlevels of a GRanges object:
>
>   > gr <- GRanges("chrM", IRanges(1:2, 10))
>   > seqlevels(gr) <- "chr1"
>   > gr
>   GRanges with 2 ranges and 0 metadata columns:
>         seqnames    ranges strand
>            <Rle> <IRanges>  <Rle>
>     [1]     chr1   [1, 10]      *
>     [2]     chr1   [2, 10]      *
>     ---
>     seqlengths:
>      chr1
>        NA
>
>There is nothing that prevents the user from doing this kind of silly
>renaming except that, in the case of the TranscriptDb object, this is
>almost certainly not what the user intended to do. What s/he really
>wanted was selecting chr1 instead of chrM, which can be achieved with:
>
>   > restoreSeqlevels(txdb)
>   > seqlevels(txdb, force=TRUE) <- "chr1"
>   > seqlengths(txdb)
>        chr1
>   249250621
>
>Yes having to restore all chromosomes to the active state before one
>can make a new selection is cumbersome so we probably need to revisit
>this.
>
>>
>> It would also help to point out somewhere (I may have missed it) that
>> TranscriptDb seqlevels (or rather the internal GRanges) are actually
>>pass
>> by reference:
>>
>> txdb2 <- txdb
>> seqlevels(txdb)
>> seqlevels(txdb2)
>>
>> restoreSeqlevels(txdb)
>> seqlevels(txdb2)
>
>That's because TranscriptDb is a reference class. IMHO it shouldn't.
>
>>
>> Also Herve: what is the definition of a "used" seqlevel? Does that mean
>> that a factor level is defined, but no range in the GRanges object is
>> assigned this level?
>
>Yes. Like here:
>
>   > gr <- GRanges("chrM", IRanges(1:2, 10), seqlengths=c(chr1=5000,
>chrM=25))
>   > gr
>   GRanges with 2 ranges and 0 metadata columns:
>         seqnames    ranges strand
>            <Rle> <IRanges>  <Rle>
>     [1]     chrM   [1, 10]      *
>     [2]     chrM   [2, 10]      *
>     ---
>     seqlengths:
>      chr1 chrM
>      5000   25
>
>Only chrM is in use:
>
>   > seqlevelsInUse(gr)
>   [1] "chrM"
>
>The idiom to drop seqlevels that are not in use is:
>
>   > seqlevels(gr) <- seqlevelsInUse(gr)
>   > gr
>   GRanges with 2 ranges and 0 metadata columns:
>         seqnames    ranges strand
>            <Rle> <IRanges>  <Rle>
>     [1]     chrM   [1, 10]      *
>     [2]     chrM   [2, 10]      *
>     ---
>     seqlengths:
>      chrM
>        25
>
>Of course, in that case, force=TRUE is not needed because the operation
>is guaranteed to not "shrink" the GRanges object.
>
>> Why would those exists in a TranscriptDb object at
>> all?
>
>The notion of "seqlevels in use" is not formally defined for a
>TranscriptDb object:
>
>   > seqlevelsInUse(txdb)
>   Error in (function (classes, fdef, mtable)  :
>     unable to find an inherited method for function ŒseqlevelsInUse¹
>for signature Œ"TranscriptDb"¹
>
>but it could be: a seqlevel or chromosome could be considered to be
>in use if there is at least 1 feature (transcript, exon or CDS) in
>the db that is on it. I think for most TranscriptDb objects, all the
>seqlevels are actually in use. However it's conceivable that a
>TranscriptDb object could have some extra seqlevels that are not in
>use (the seqlevels + their lengths + their circularity flags are stored
>in a separate table, the 'chrominfo' table).
>
>Anyway, when I added the seqlevelsInUse() generic a few months ago,
>I didn't write a method for TranscriptDb objects because I couldn't
>think of a use case for it. And also because without any change to
>the current db schema, this would be a costly operation as it would
>need to scan the entire database, which cannot be done with a single
>query.
>
>Now with the seqlevels() setter allowing to reduce the seqlevels
>of a TranscriptDb object, we face the following choices: (a)
>implement the seqlevelsInUse,TranscriptDb method and make the
>'force' arg of the seqlevels() setter behave consistently with
>that, or (b) not implement the seqlevelsInUse,TranscriptDb method
>and make the 'force' arg meaningless. As I said earlier, my preference
>goes for (b).
>
>H.
>
>>
>> Florian
>>
>>
>>
>>
>>
>> On 9/18/13 9:08 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>>
>>> Note that currently it's kind of inconsistent anyway because it doesn't
>>> look at the seqlevels that are in use. For example:
>>>
>>>    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>>
>>> Trying to drop chrUn_gl000249 (which I know is not in use):
>>>
>>>    > seqlevels(txdb) <- setdiff(seqlevels(txdb), "chrUn_gl000249")
>>>    Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force =
>>> force,  :
>>>      You need to use force=TRUE if you want to drop seqlevels.
>>>
>>> 'force=TRUE' should only be required when trying to drop seqlevels
>>> that are in use.
>>>
>>> So the choice are more between: (a) consistent, (b) convenient,
>>> or (c) leave it as it is (which is neither convenient or consistent).
>>>
>>> I vote for (b).
>>>
>>> H.
>>>
>>>
>>> On 09/18/2013 11:31 AM, Marc Carlson wrote:
>>>> I actually considered this, but I opted to do it this way just for the
>>>> sake of being consistent (which was my whole mission for implementing
>>>> seqlevels in here in the 1st place).  Now I could make it more
>>>> convenient here and break consistency with how it is used elsewhere,
>>>>but
>>>> what do people prefer?
>>>>
>>>> Consistent or convenient?
>>>>
>>>>
>>>>     Marc
>>>>
>>>>
>>>>
>>>> On 09/18/2013 10:40 AM, Hervé Pagès wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Wouldn't it make sense to just ignore the 'force' arg when
>>>>> dropping the seqlevels of a TranscriptDb?
>>>>>
>>>>> The 'force' argument is FALSE by default and this prevents
>>>>> seqlevels<- to shrink GRanges or other vector-like objects
>>>>> when the user tries to drop seqlevels that are in use.
>>>>> Internally seqlevels<- calls seqlevelsInUse() to get the
>>>>> seqlevels currently in use and see if they intersect with
>>>>> the seqlevels to drop.
>>>>>
>>>>> In the TranscriptDb situation, people always have to use
>>>>> 'force=TRUE' to drop seqlevels, regardless of whether the
>>>>> levels to drop are in use or not (the seqlevelsInUse()
>>>>> getter not being defined for TranscriptDb objects, I suspect
>>>>> seqlevels<- doesn't look at this).
>>>>>
>>>>> So maybe 'force' could just be ignored for TranscriptDb objects?
>>>>> That would make seqlevels<- a little bit more user-friendly on
>>>>> those objects.
>>>>>
>>>>> Thanks,
>>>>> H.
>>>>>
>>>>>
>>>>> On 09/13/2013 10:38 AM, Marc Carlson wrote:
>>>>>> Hi Florian,
>>>>>>
>>>>>> Yes we are trying to make things more uniform.  seqlevels() lets you
>>>>>> rename as well as deactivate chromosomes you want to ignore, so it
>>>>>>was
>>>>>> really redundant with isActiveSeq().  So we are moving away from
>>>>>> isActiveSeq() just so that users have less to learn about.  The
>>>>>>reason
>>>>>> why isActiveSeq was different from seqlevels was just because it was
>>>>>> born for a TranscriptDb (which is based on an annotation database)
>>>>>> instead of being born on a GRanges object.  So seqlevels was the
>>>>>>more
>>>>>> general tool.
>>>>>>
>>>>>>      Marc
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 09/13/2013 07:24 AM, Hahne, Florian wrote:
>>>>>>> Hi Marc,
>>>>>>> I saw these warnings in Gviz, but they stem from GenomicFeatures
>>>>>>>
>>>>>>> Warning messages:
>>>>>>> 1: 'isActiveSeq' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>> 2: 'isActiveSeq' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>> 3: 'isActiveSeq<-' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>> 4: 'isActiveSeq<-' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>> 5: 'isActiveSeq' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>> 6: 'isActiveSeq<-' is deprecated.
>>>>>>> Use 'seqlevels' instead.
>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>
>>>>>>> So has the whole idea of active chromosomes in the data base been
>>>>>>> dropped? I could not find anything in the change notes. Do I get it
>>>>>>> right that you can now do
>>>>>>> seqlevels(txdb, force=TRUE) <- "chr1"
>>>>>>> if you just want the first chromosome to be active?
>>>>>>>
>>>>>>> Florian
>>>>>>>
>>>>>>
>>>>>>
>>>>>>      [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>
>
>-- 
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone:  (206) 667-5791
>Fax:    (206) 667-1319



More information about the Bioc-devel mailing list