[Bioc-sig-seq] GRangesList with duplicate names

Ivan Gregoretti ivangreg at gmail.com
Fri Feb 25 15:48:26 CET 2011


Hello Hervé,

While we wait for comments from "power users", I just wanted to say
that non-unique names open the door for potentially more problems than
solutions.

Imagine a Python dictionary or a Perl hash with multiple values per key.

I wonder how many R/Bioconductor functions exploit the vector's
capability to hold multiple elements with the same name.

Regardless, thanks for asking users opinions.

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1016 and 1-301-496-1592
Fax: 1-301-496-9878



On Fri, Feb 25, 2011 at 3:08 AM, Pages, Herve <hpages at fhcrc.org> wrote:
> Hi Dario,
>
> A GRangesList object with duplicated names is apparently
> considered broken:
>
>> grl <- GRangesList(GRanges(), GRanges())
>> names(grl) <- c("a", "a")
>> validObject(grl)
> Error in `rownames<-`(`*tmp*`, value = c("a", "a")) :
>  duplicate rownames not allowed
>
> If we are ok with this feature, we should fix the "names<-"
> method (and any other code around that lets the user generate
> broken objects).
>
> But if we are not ok with this feature, we should modify
> the validity method for GRangesList objects. I tend to prefer
> this solution for 3 reasons:
>
>  1. Consistency with ordinary vectors: the names of a vector
>     in R are not required to be unique.
>
>  2. It's not uncommon to see the same name used for 2 different
>     genes. One might still want to be able to stick those names
>     on a GRangesList object where each top-level element corresponds
>     to a gene (e.g. exons grouped by gene).
>
>  3. It's easier to modify the validity method than to go around
>     trying to find and fix every piece of code in GenomicRanges
>     (and maybe other places) that can potentially produce a
>     GRangesList object with duplicated names.
>
> How do our power users feel about this?
>
> Thanks,
> H.
>
>
> ----- Original Message -----
> From: "Dario Strbenac" <D.Strbenac at garvan.org.au>
> To: bioc-sig-sequencing at r-project.org
> Sent: Thursday, February 24, 2011 10:00:11 PM
> Subject: [Bioc-sig-seq] GRangesList with duplicate names
>
> Hello,
>
> It is possible to create a GRangesList with duplicated names, but not to re-order it.
>
>> summary(grl)
>     Length       Class        Mode
>          3 GRangesList          S4
>> names(grl) <- c("Cancer", "Cancer", "Normal")
>> grl[3:1]
> Error in `rownames<-`(`*tmp*`, value = c("Normal", "Cancer", "Cancer")) :
>  duplicate rownames not allowed
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
>  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_1.2.3 IRanges_1.8.9
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list