[Bioc-devel] SummarizedExperiment: duplication of metadata, when modifying colData

Hervé Pagès hpages at fredhutch.org
Fri Dec 15 03:29:19 CET 2017


Hi Felix,

Nice catch. This can actually be reproduced with just:

   > example(SummarizedExperiment)
   > metadata(se0) <- list(aa="aa")
   > se0[1 , ] <- se0[1 , ]
   > metadata(se0)
   $aa
   [1] "aa"

   $aa
   [1] "aa"

The culprit is this line:

   ans_metadata <- c(metadata(x), metadata(value))

in the "[<-" method for SummarizedExperiment objects.

So somehow it looks like it was a deliberate decision to have
[<- combine the metadata of 'x' and 'value'. Problem is that
this breaks the more-than-reasonable expectation that something
like x[i , j] <- x[i , j] should be a no-op.

I replaced the above line with:

   ans_metadata <- metadata(x)

in SummarizedExperiment 1.9.5 (devel). With this change [<-
leaves metadata(x) intact and x[i , j] <- x[i , j] behaves like
a no-op:

 
https://github.com/Bioconductor/SummarizedExperiment/commit/e4fcb99c442e2f17b0ccddfb05df9f160e0bbe40

Will port to release soon.

Cheers,
H.


On 12/12/2017 01:05 AM, Felix Ernst wrote:
> Hi all,
> 
>   
> 
> I got a bit of weird behaviour with SummarizedExperiments in Bioc 3.6 and
> 3.7. I suppose it is a bug, but I might be wrong, since the accession to the
> SummarizedExperiment object is not really straight forward. Any suggestions?
> 
> library(GenomicRanges)
> 
> library(SummarizedExperiment)
> 
>   
> 
> nrows <- 200; ncols <- 6
> 
> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
> 
> colnames(counts) <- LETTERS[1:6]
> 
> rownames(counts) <- 1:nrows
> 
> counts2 <- counts-floor(counts)
> 
> rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
> 
>                       IRanges(floor(runif(200, 1e5, 1e6)), width=100),
> 
>                       strand=sample(c("+", "-"), 200, TRUE),
> 
>                       feature_id=sprintf("ID%03d", 1:200))
> 
> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
> 
>                       row.names=LETTERS[1:6])
> 
>   
> 
> se <- SummarizedExperiment(assays=list(counts=counts),
> 
>                             rowRanges=rowRanges,
> 
>                             colData=colData)
> 
> colData(se)$xyz <- rep("",ncol(se))
> 
> metadata(se) <- list("meep" = "meep")
> 
>   
> 
> str(metadata(se))
> 
> colData(se[, 1])$xyz <- "abc"
> 
> str(metadata(se))
> 
> The first metadata() returns a list, length of 1, with the correct data. The
> second call returns a list of two, with a duplicated entries and every
> further colData modification (and replacing data) duplicates the entries in
> the metadata further.
> 
>> str(metadata(se))
> 
> List of 1
> 
> $ meep: chr "meep"
> 
>> colData(se[, 1])$xyz <- "abc"
> 
>> str(metadata(se))
> 
> List of 2
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
>> colData(se[, 2])$xyz <- "abc"
> 
>> str(metadata(se))
> 
> List of 4
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
>> colData(se[, 2])$xyz <- "abc"
> 
>> str(metadata(se))
> 
> List of 8
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> $ meep: chr "meep"
> 
> Thanks for any advice and suggestions.
> 
> Felix
> 
> 
> 
> ---
> 
> 
> 
> Felix Ernst, PhD
> 
> Universit� Libre de Bruxelles
> 
> RNA MOLECULAR BIOLOGY
> 
> BIOPARK Charleroi Brussels-South CAMPUS
> 
> Rue Profs Jeener & Brachet, 12
> 
> B-6041 Charleroi - Gosselies
> 
> BELGIUM
> 
> +32(2)650 9774 (office phone)
> 
>   <mailto:felix.ernst at ulb.ac.be> felix.ernst at ulb.ac.be
> 
>   
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ZQe-rRouYDtnCV1eWpTTwXEhYq7F6bt4J5-bJtIYxyw&s=_1NFvrNbqOfrWIP1fxPoIZU9Og4dQzUjfpjp2ww6tF8&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list