[Bioc-devel] isActiveSeq deprecated

Tue Oct 1 22:17:22 CEST 2013

Thanks for your feedback Florian. We're in the process of revisiting
the mechanism for selecting chromosomes on a TranscriptDb object.
Expect an update on this soon.

H.

On 10/01/2013 12:18 AM, Hahne, Florian wrote:
> Same view here,
> with all the different data types that are around in the Bioconductor
> world these days it seems to me that a consistent behaviour is preferable.
> Florian
>
> On 9/24/13 8:22 AM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>
>> Hi Florian, Marc,
>>
>> On 09/18/2013 11:55 PM, Hahne, Florian wrote:
>>> Hi Marc, Herve,
>>> I also noticed this behaviour:
>>>
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>> seqlevels(txdb, force=TRUE) <- c("chr1", "ch2")
>>> oldLevs <- seqlevels(txdb)
>>> seqlevels(txdb, force=TRUE) <- "chr1"
>>>
>>> seqlevels(txdb, force=TRUE) <- oldLevs
>>> Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force =
>>> force,
>>>    :
>>>     The replacement value must be either a 1 to 1 replacement or a
>>> subset of
>>> the original set when replacing the 'seqinfo' of a TranscriptDb object
>>>
>>>
>>> But:
>>>
>>> restoreSeqlevels(txdb)
>>> seqlevels(txdb, force=TRUE) <- oldLevs
>>>
>>>
>>> This is probably intentional, but I found it to be rather confusing. If
>>> one should be able to use the seqlevels replacement method to control
>>> active and inactive chromosomes in the TranscriptDb object, then having
>>> this mandatory step of restoring all chromosomes to the active state
>>> followed by another restriction to be quite cumbersome. At least a
>>> somewhat more useful error message would help in that case. Or couldn't
>>> the replacement method always call restoreSeqlevels internally.
>>
>> Somewhat related to this, I also found the following issue:
>>
>>    > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>    > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>    > seqlevels(txdb, force=TRUE) <- "chrM"
>>    > seqlengths(txdb)
>>     chrM
>>    16571
>>    > seqlevels(txdb) <- "chr1"
>>    > seqlengths(txdb)
>>     chr1
>>    16571
>>
>> The 2nd call to the seqlevels() setter actually performed a
>> *renaming* operation (a clue for this is that I didn't have to
>> use force=TRUE). Renaming the seqlevels of a TranscriptDb object
>> is something I think we want to support, e.g. when there is the
>> need to use a different naming convention (like in
>> seqlevels(txdb) <- "M"). This is exactly the syntax that one
>> would use to rename the seqlevels of a GRanges object:
>>
>>    > gr <- GRanges("chrM", IRanges(1:2, 10))
>>    > seqlevels(gr) <- "chr1"
>>    > gr
>>    GRanges with 2 ranges and 0 metadata columns:
>>          seqnames    ranges strand
>>             <Rle> <IRanges>  <Rle>
>>      [1]     chr1   [1, 10]      *
>>      [2]     chr1   [2, 10]      *
>>      ---
>>      seqlengths:
>>       chr1
>>         NA
>>
>> There is nothing that prevents the user from doing this kind of silly
>> renaming except that, in the case of the TranscriptDb object, this is
>> almost certainly not what the user intended to do. What s/he really
>> wanted was selecting chr1 instead of chrM, which can be achieved with:
>>
>>    > restoreSeqlevels(txdb)
>>    > seqlevels(txdb, force=TRUE) <- "chr1"
>>    > seqlengths(txdb)
>>         chr1
>>    249250621
>>
>> Yes having to restore all chromosomes to the active state before one
>> can make a new selection is cumbersome so we probably need to revisit
>> this.
>>
>>>
>>> It would also help to point out somewhere (I may have missed it) that
>>> TranscriptDb seqlevels (or rather the internal GRanges) are actually
>>> pass
>>> by reference:
>>>
>>> txdb2 <- txdb
>>> seqlevels(txdb)
>>> seqlevels(txdb2)
>>>
>>> restoreSeqlevels(txdb)
>>> seqlevels(txdb2)
>>
>> That's because TranscriptDb is a reference class. IMHO it shouldn't.
>>
>>>
>>> Also Herve: what is the definition of a "used" seqlevel? Does that mean
>>> that a factor level is defined, but no range in the GRanges object is
>>> assigned this level?
>>
>> Yes. Like here:
>>
>>    > gr <- GRanges("chrM", IRanges(1:2, 10), seqlengths=c(chr1=5000,
>> chrM=25))
>>    > gr
>>    GRanges with 2 ranges and 0 metadata columns:
>>          seqnames    ranges strand
>>             <Rle> <IRanges>  <Rle>
>>      [1]     chrM   [1, 10]      *
>>      [2]     chrM   [2, 10]      *
>>      ---
>>      seqlengths:
>>       chr1 chrM
>>       5000   25
>>
>> Only chrM is in use:
>>
>>    > seqlevelsInUse(gr)
>>    [1] "chrM"
>>
>> The idiom to drop seqlevels that are not in use is:
>>
>>    > seqlevels(gr) <- seqlevelsInUse(gr)
>>    > gr
>>    GRanges with 2 ranges and 0 metadata columns:
>>          seqnames    ranges strand
>>             <Rle> <IRanges>  <Rle>
>>      [1]     chrM   [1, 10]      *
>>      [2]     chrM   [2, 10]      *
>>      ---
>>      seqlengths:
>>       chrM
>>         25
>>
>> Of course, in that case, force=TRUE is not needed because the operation
>> is guaranteed to not "shrink" the GRanges object.
>>
>>> Why would those exists in a TranscriptDb object at
>>> all?
>>
>> The notion of "seqlevels in use" is not formally defined for a
>> TranscriptDb object:
>>
>>    > seqlevelsInUse(txdb)
>>    Error in (function (classes, fdef, mtable)  :
>>      unable to find an inherited method for function ŒseqlevelsInUse¹
>> for signature Œ"TranscriptDb"¹
>>
>> but it could be: a seqlevel or chromosome could be considered to be
>> in use if there is at least 1 feature (transcript, exon or CDS) in
>> the db that is on it. I think for most TranscriptDb objects, all the
>> seqlevels are actually in use. However it's conceivable that a
>> TranscriptDb object could have some extra seqlevels that are not in
>> use (the seqlevels + their lengths + their circularity flags are stored
>> in a separate table, the 'chrominfo' table).
>>
>> Anyway, when I added the seqlevelsInUse() generic a few months ago,
>> I didn't write a method for TranscriptDb objects because I couldn't
>> think of a use case for it. And also because without any change to
>> the current db schema, this would be a costly operation as it would
>> need to scan the entire database, which cannot be done with a single
>> query.
>>
>> Now with the seqlevels() setter allowing to reduce the seqlevels
>> of a TranscriptDb object, we face the following choices: (a)
>> implement the seqlevelsInUse,TranscriptDb method and make the
>> 'force' arg of the seqlevels() setter behave consistently with
>> that, or (b) not implement the seqlevelsInUse,TranscriptDb method
>> and make the 'force' arg meaningless. As I said earlier, my preference
>> goes for (b).
>>
>> H.
>>
>>>
>>> Florian
>>>
>>>
>>>
>>>
>>>
>>> On 9/18/13 9:08 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>>>
>>>> Note that currently it's kind of inconsistent anyway because it doesn't
>>>> look at the seqlevels that are in use. For example:
>>>>
>>>>     library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>     txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>>>
>>>> Trying to drop chrUn_gl000249 (which I know is not in use):
>>>>
>>>>     > seqlevels(txdb) <- setdiff(seqlevels(txdb), "chrUn_gl000249")
>>>>     Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force =
>>>> force,  :
>>>>       You need to use force=TRUE if you want to drop seqlevels.
>>>>
>>>> 'force=TRUE' should only be required when trying to drop seqlevels
>>>> that are in use.
>>>>
>>>> So the choice are more between: (a) consistent, (b) convenient,
>>>> or (c) leave it as it is (which is neither convenient or consistent).
>>>>
>>>> I vote for (b).
>>>>
>>>> H.
>>>>
>>>>
>>>> On 09/18/2013 11:31 AM, Marc Carlson wrote:
>>>>> I actually considered this, but I opted to do it this way just for the
>>>>> sake of being consistent (which was my whole mission for implementing
>>>>> seqlevels in here in the 1st place).  Now I could make it more
>>>>> convenient here and break consistency with how it is used elsewhere,
>>>>> but
>>>>> what do people prefer?
>>>>>
>>>>> Consistent or convenient?
>>>>>
>>>>>
>>>>>      Marc
>>>>>
>>>>>
>>>>>
>>>>> On 09/18/2013 10:40 AM, Hervé Pagès wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>> Wouldn't it make sense to just ignore the 'force' arg when
>>>>>> dropping the seqlevels of a TranscriptDb?
>>>>>>
>>>>>> The 'force' argument is FALSE by default and this prevents
>>>>>> seqlevels<- to shrink GRanges or other vector-like objects
>>>>>> when the user tries to drop seqlevels that are in use.
>>>>>> Internally seqlevels<- calls seqlevelsInUse() to get the
>>>>>> seqlevels currently in use and see if they intersect with
>>>>>> the seqlevels to drop.
>>>>>>
>>>>>> In the TranscriptDb situation, people always have to use
>>>>>> 'force=TRUE' to drop seqlevels, regardless of whether the
>>>>>> levels to drop are in use or not (the seqlevelsInUse()
>>>>>> getter not being defined for TranscriptDb objects, I suspect
>>>>>> seqlevels<- doesn't look at this).
>>>>>>
>>>>>> So maybe 'force' could just be ignored for TranscriptDb objects?
>>>>>> That would make seqlevels<- a little bit more user-friendly on
>>>>>> those objects.
>>>>>>
>>>>>> Thanks,
>>>>>> H.
>>>>>>
>>>>>>
>>>>>> On 09/13/2013 10:38 AM, Marc Carlson wrote:
>>>>>>> Hi Florian,
>>>>>>>
>>>>>>> Yes we are trying to make things more uniform.  seqlevels() lets you
>>>>>>> rename as well as deactivate chromosomes you want to ignore, so it
>>>>>>> was
>>>>>>> really redundant with isActiveSeq().  So we are moving away from
>>>>>>> isActiveSeq() just so that users have less to learn about.  The
>>>>>>> reason
>>>>>>> why isActiveSeq was different from seqlevels was just because it was
>>>>>>> born for a TranscriptDb (which is based on an annotation database)
>>>>>>> instead of being born on a GRanges object.  So seqlevels was the
>>>>>>> more
>>>>>>> general tool.
>>>>>>>
>>>>>>>       Marc
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 09/13/2013 07:24 AM, Hahne, Florian wrote:
>>>>>>>> Hi Marc,
>>>>>>>> I saw these warnings in Gviz, but they stem from GenomicFeatures
>>>>>>>>
>>>>>>>> Warning messages:
>>>>>>>> 1: 'isActiveSeq' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>> 2: 'isActiveSeq' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>> 3: 'isActiveSeq<-' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>> 4: 'isActiveSeq<-' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>> 5: 'isActiveSeq' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>> 6: 'isActiveSeq<-' is deprecated.
>>>>>>>> Use 'seqlevels' instead.
>>>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated").
>>>>>>>>
>>>>>>>> So has the whole idea of active chromosomes in the data base been
>>>>>>>> dropped? I could not find anything in the change notes. Do I get it
>>>>>>>> right that you can now do
>>>>>>>> seqlevels(txdb, force=TRUE) <- "chr1"
>>>>>>>> if you just want the first chromosome to be active?
>>>>>>>>
>>>>>>>> Florian
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>       [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319