[Bioc-devel] request: high-level seqlevel utilities

Valerie Obenchain vobencha at fhcrc.org
Fri Dec 27 22:49:46 CET 2013

I think the idea was to avoid the proliferation of small, specialized 

gr <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)

## autosomes
std <- keepStandardChromosomes(gr, "Homo sapiens", "UCSC")
dropSeqlevels(std, c("chrX", "chrY"))

## allosomes
keepSeqlevels(std, c("chrX", "chrY"))

While the above code seems straight forward maybe it's not enough of an 
improvement over what you had to do before.

If we need stand alone functions for 'autosomes', 'allosomes', etc. 
maybe this isn't the right approach. Eariler in this thread there was 
mention of storing this information in the Seqinfo. Maybe we need to 
step back to take another look at that. If this information were in the 
Seqinfo these become more like getters than one-off functions with their 
own overhead/maintenance etc.

Half the team is out so a final decision will have to wait a week. 
Thanks for the feedback.


On 12/27/2013 12:53 PM, Tim Triche, Jr. wrote:
> Hi Sonali,
> Thank you for this much requested addition to the GenomicRanges API!
> I would like to second Michael Lawrence's assertion that keepAutosomes() is more straightforward and harder to misunderstand or misuse than a special case of keep/dropSeqlevels.   (If the author of a function has trouble using it, think what that means for the rest of us!)  An ideal API is (IMHO) one that is easy to use well, and hard to use poorly. I propose that this exchange has illustrated why a very experienced programmer (ML) would nonetheless suggest API "bloat".
> The recent history of the Ranges API has (again IMHO) been a series of exchanges balancing convenience against elegance, often with significant disagreements on where the balance should lie. But if a "special case" turns out to be the common case, historically, a variance has been granted. Examples include sortSeqlevels(), as(data.frame, "GRanges"), and x$foo as shorthand for elementMetadata(x)[,'foo'].
> People seldom do what they know to be right. They do what is convenient, then repent. (Ask me how I know ;-))  In my experience, at least, the best way to get people to "do the right thing" is to make it convenient.
> Thank you for being bold enough to stand up for the design choices you believe in. I hope my request to trade a little elegance for readability does not come across as criticism thereof.
> Best,
> --t
>> On Dec 27, 2013, at 12:11 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>> I guess you mean dropSeqlevels, not keepSeqlevels? That should work. But
>> your function needs some tweaking, I think. The function should just take
>> the seqnameStyle from the object. There's no need to require the user to
>> specify it, and then check for consistency. I would like to be able to just
>> call:
>> keepStandardChromosomes(trans)
>> Also, I think you made GenomicRanges depend on AnnotationDbi. That's a
>> major change. Before, it was just in Suggests. Did you discuss this with
>> anyone?
>> Michael
>>> On Fri, Dec 27, 2013 at 11:37 AM, Arora, Sonali <sarora at fhcrc.org> wrote:
>>> Hi Michael,
>>> We decided to come up with one function called keepStandardChromosomes()
>>> which
>>> is a wrapper for both the functions suggested by you (i.e.,
>>> keepPrimaryChromosomes() and keepAutosomes() )
>>> Here is an example:
>>> library(AnnotationDbi)
>>> library(GenomicRanges)
>>> txdb <- loadDb(system.file("extdata", "UCSC_knownGene_sample.sqlite",
>>>                            package="GenomicFeatures"))
>>> trans <- transcripts(txdb)
>>> seqlevels(trans)
>>> #gets all GRanges for Human chromosomes : 1-22,X,Y,M
>>> got <- keepStandardChromosomes(trans,style="UCSC", species="Homo sapiens")
>>> seqlevels(got)
>>> #for getting only autosomes, one could then extract using keepSeqlevels()
>>> #as shown earlier:
>>> keepSeqlevels(got,c("chrX","chrY"))
>>> keepSeqlevels(got,c("chrX","chrY", "chrM"))
>>> Thanks and Regards
>>> Sonali.
>>> On 12/25/2013 4:15 AM, Michael Lawrence wrote:
>>> Awesome -- how about keepAutosomes?
>>>> On Tue, Dec 24, 2013 at 1:11 PM, Arora, Sonali <sarora at fhcrc.org> wrote:
>>>> Hi everyone,
>>>> We are pleased to announce that we have added a new function called
>>>> keepStandardChromosomes() to  GenomicRanges(1.15.18)  - devel branch.
>>>> This function allows a user to subset a given object (containing a
>>>> seqinfo class) to
>>>> retain only the primary chromosomes and the autosomes.
>>>> Please feel free to get back if you face any issues.
>>>> Thanks,
>>>> Sonali Arora.
>>>> On 12/16/2013 11:01 AM, Michael Lawrence wrote:
>>>> Awesome, thanks, Sonali. And welcome to the team.
>>>> Michael
>>>>> On Mon, Dec 16, 2013 at 10:38 AM, Arora, Sonali <sarora at fhcrc.org> wrote:
>>>>> Hi Michael,
>>>>> That is an extremely interesting question. We have a couple of ideas and
>>>>> are beginning to work on it.
>>>>> We hope to come up with something soon.
>>>>> Thanks,
>>>>> Sonali.
>>>>>> On 12/16/2013 6:14 AM, Michael Lawrence wrote:
>>>>>>   should be stored with the Seqinfo. It could be imputed
>>>>>> (along with the isCircular I think) via th
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>     [[alternative HTML version deleted]]
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

Valerie Obenchain

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: vobencha at fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319

More information about the Bioc-devel mailing list