[Bioc-sig-seq] GRangesList with duplicate names

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Feb 25 16:05:20 CET 2011


Hi,

I think I'm with Ivan and leaning towards not allowing duplicate names
in a GRangesList, even though normal lists in R do allow duplicate
names.

As Ivan suggested, I also often use the names of any R list when I
want to use the list as something similar to a Python dictionary.

Still, if the consensus turns out to allow duplicate names in
*RangesList(s), perhaps it'd be nice for the the validity method to
fire off a warning that duplicate names exist in the list so the user
knows something might be fishy.

-steve

On Fri, Feb 25, 2011 at 9:48 AM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
> Hello Hervé,
>
> While we wait for comments from "power users", I just wanted to say
> that non-unique names open the door for potentially more problems than
> solutions.
>
> Imagine a Python dictionary or a Perl hash with multiple values per key.
>
> I wonder how many R/Bioconductor functions exploit the vector's
> capability to hold multiple elements with the same name.
>
> Regardless, thanks for asking users opinions.
>
> Ivan
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1016 and 1-301-496-1592
> Fax: 1-301-496-9878
>
>
>
> On Fri, Feb 25, 2011 at 3:08 AM, Pages, Herve <hpages at fhcrc.org> wrote:
>> Hi Dario,
>>
>> A GRangesList object with duplicated names is apparently
>> considered broken:
>>
>>> grl <- GRangesList(GRanges(), GRanges())
>>> names(grl) <- c("a", "a")
>>> validObject(grl)
>> Error in `rownames<-`(`*tmp*`, value = c("a", "a")) :
>>  duplicate rownames not allowed
>>
>> If we are ok with this feature, we should fix the "names<-"
>> method (and any other code around that lets the user generate
>> broken objects).
>>
>> But if we are not ok with this feature, we should modify
>> the validity method for GRangesList objects. I tend to prefer
>> this solution for 3 reasons:
>>
>>  1. Consistency with ordinary vectors: the names of a vector
>>     in R are not required to be unique.
>>
>>  2. It's not uncommon to see the same name used for 2 different
>>     genes. One might still want to be able to stick those names
>>     on a GRangesList object where each top-level element corresponds
>>     to a gene (e.g. exons grouped by gene).
>>
>>  3. It's easier to modify the validity method than to go around
>>     trying to find and fix every piece of code in GenomicRanges
>>     (and maybe other places) that can potentially produce a
>>     GRangesList object with duplicated names.
>>
>> How do our power users feel about this?
>>
>> Thanks,
>> H.
>>
>>
>> ----- Original Message -----
>> From: "Dario Strbenac" <D.Strbenac at garvan.org.au>
>> To: bioc-sig-sequencing at r-project.org
>> Sent: Thursday, February 24, 2011 10:00:11 PM
>> Subject: [Bioc-sig-seq] GRangesList with duplicate names
>>
>> Hello,
>>
>> It is possible to create a GRangesList with duplicated names, but not to re-order it.
>>
>>> summary(grl)
>>     Length       Class        Mode
>>          3 GRangesList          S4
>>> names(grl) <- c("Cancer", "Cancer", "Normal")
>>> grl[3:1]
>> Error in `rownames<-`(`*tmp*`, value = c("Normal", "Cancer", "Cancer")) :
>>  duplicate rownames not allowed
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>>  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.2.3 IRanges_1.8.9
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list