[Bioc-devel] SummarizedExperiment: structure loss, when mixing matrix and data.frame data

Hervé Pagès hpages at fredhutch.org
Tue Nov 28 21:22:26 CET 2017


Hi,

Looks like at an even lower level, S4Vectors:::listElementType()
is at the origin of the problem:

   > S4Vectors:::listElementType(list(matrix(), data.frame()))
   [1] "vector"

Should return "ANY" here.

Will try to fix.

H.

On 11/26/2017 07:03 AM, Martin Morgan wrote:
> It would seem to be a bug in endoapply
>
> lst <- SimpleList(
>      m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
>      df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
> )
> dimnames(lst[[1]])                      # list(c("a", "b"), c("A", "B"))
> dimnames(endoapply(lst, identity)[[1]]) # NULL
>
> specifically S4Vectors:::coerceToSimpleList
>
> lst <- list(
>      m = matrix(0, 2, 2, dimnames=list(letters[1:2], LETTERS[1:2])),
>      df = data.frame(A=1:2, B=1:2, row.names=letters[1:2])
> )
> S4Vectors:::coerceToSimpleList(lst)
>
> Martin
>
>
> On 11/26/2017 07:56 AM, Vincent Carey wrote:
>> Confirmed with the following sessionInfo(), satisfying biocValid()==TRUE
>>
>>> sessionInfo()
>>
>> R Under development (unstable) (2017-11-22 r73776)
>>
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> Running under: Linux Mint 18.1
>>
>>
>> Matrix products: default
>>
>> BLAS: /home/stvjc/R-35-dist/lib/R/lib/libRblas.so
>>
>> LAPACK: /home/stvjc/R-35-dist/lib/R/lib/libRlapack.so
>>
>>
>> locale:
>>
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>> attached base packages:
>>
>> [1] parallel  stats4    stats     graphics  grDevices utils     datasets
>>
>> [8] methods   base
>>
>>
>> other attached packages:
>>
>> [1] SummarizedExperiment_1.9.2 DelayedArray_0.5.5
>>
>> [3] matrixStats_0.52.2         Biobase_2.39.0
>>
>> [5] GenomicRanges_1.31.1       GenomeInfoDb_1.15.1
>>
>> [7] IRanges_2.13.4             S4Vectors_0.17.10
>>
>> [9] BiocGenerics_0.25.0
>>
>>
>> loaded via a namespace (and not attached):
>>
>>   [1] lattice_0.20-35         bitops_1.0-6            grid_3.5.0
>>
>>   [4] zlibbioc_1.25.0         XVector_0.19.1          Matrix_1.2-12
>>
>>   [7] tools_3.5.0             RCurl_1.95-4.8          compiler_3.5.0
>>
>> [10] GenomeInfoDbData_0.99.2
>>
>> On Sun, Nov 26, 2017 at 7:09 AM, Felix Ernst <felix.ernst at ulb.ac.be>
>> wrote:
>>
>>> Hi all,
>>>
>>> I got different results constructing a SummarizedExperiment in 3.6 and
>>> 3.7. My question is, whether this is intentional or a bug.
>>>
>>> library(GenomicRanges)
>>> library(SummarizedExperiment)
>>>
>>> nrows <- 200; ncols <- 6
>>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>>> colnames(counts) <- LETTERS[1:6]
>>> rownames(counts) <- 1:nrows
>>> counts2 <- counts-floor(counts)
>>> rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
>>>                       IRanges(floor(runif(200, 1e5, 1e6)), width=100),
>>>                       strand=sample(c("+", "-"), 200, TRUE),
>>>                       feature_id=sprintf("ID%03d", 1:200))
>>> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
>>>                       row.names=LETTERS[1:6])
>>>
>>> se <- SummarizedExperiment(assays=list(counts=counts),
>>>                             rowRanges=rowRanges,
>>>                             colData=colData)
>>>
>>> str(assays(se)$counts)
>>> assays(se)$counts2 <- as.data.frame(counts2)
>>> str(assays(se)$counts)
>>>
>>> On a Windows 10 R3.4.2 Bioc 3.6 this produces:
>>> num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
>>>   - attr(*, "dimnames")=List of 2
>>>    ..$ : chr [1:200] "1" "2" "3" "4" ...
>>>    ..$ : chr [1:6] "A" "B" "C" "D" ...
>>>   num [1:200, 1:6] 8815 6314 1945 6185 5935 ...
>>>   - attr(*, "dimnames")=List of 2
>>>    ..$ : chr [1:200] "1" "2" "3" "4" ...
>>>    ..$ : chr [1:6] "A" "B" "C" "D" ...
>>>
>>> On Ubuntu 17.10 R-devel r73779 Bioc3.7  this produces
>>> num [1:200, 1:6] 8636 7040 9275 4821 2475 ...
>>>   - attr(*, "dimnames")=List of 2
>>>    ..$ : chr [1:200] "1" "2" "3" "4" ...
>>>    ..$ : chr [1:6] "A" "B" "C" "D" ...
>>>   num [1:1200] 8636 7040 9275 4821 2475 ...
>>>
>>> Somehow the structure is lost.
>>>
>>> This happens, if I mix matrix and data.frame data, and doesn’t, if I use
>>> only matrices. The man page defines matrix-like objects,
>>> which a data.frame is (isn’t it?) and the behavior is different from
>>> Bioc3.6 to Bioc3.7.
>>>
>>> I can rule out that this is a Windows/Linux thing, because the Travis
>>> build error, which pointed to an difference in the first place,
>>> didn‘t occur with bioc-release, just with bioc-devel.
>>>
>>> Thanks for any advice and suggestions.
>>>
>>> Felix
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SrbnY4HvnR7uE6LrH4stQ9IFdOuM8t4iAAfY0cNl5os&s=fdsgKHDmmwwW2_VMcibMhHtNe79f9cDWa8igAAlidII&e=
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=SrbnY4HvnR7uE6LrH4stQ9IFdOuM8t4iAAfY0cNl5os&s=fdsgKHDmmwwW2_VMcibMhHtNe79f9cDWa8igAAlidII&e=
>>
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list