[Bioc-devel] requirement for named assays in SummarizedExperiment

Valerie Obenchain vobencha at fredhutch.org
Fri Mar 6 17:20:48 CET 2015


Hi Aaron,

Thanks for catching this.

I favor enforcing names in 'assays'. Combining by position alone is too 
dangerous. I'm thinking of the VCF class where the genome information is 
stored in 'assays' and the fields are rarely in the same order.

Looks like we also need a more informative error message when names 
don't match.

 > assays(se1)
List of length 1
names(1): counts1

 > assays(se2)
List of length 1
names(1): counts2

 > cbind(se1, se2)
Error in sQuote(accessorName) :
   argument "accessorName" is missing, with no default


Valerie


On 03/05/2015 11:09 PM, Aaron Lun wrote:
> Dear all,
>
> I stumbled upon some unexpected behaviour with cbind'ing
> SummarizedExperiment objects with unnamed assays:
>
>> require(GenomicRanges)
>> nrows <- 5; ncols <- 4
>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>> rowData <- GRanges("chr1", IRanges(1:nrows, 1:nrows))
>> colData <- DataFrame(Treatment=1:ncols, row.names=LETTERS[1:ncols])
>> sset <- SummarizedExperiment(counts, rowData=rowData, colData=colData)
>> sset
> class: SummarizedExperiment
> dim: 5 4
> exptData(0):
> assays(1): ''
> rownames: NULL
> rowData metadata column names(0):
> colnames(4): A B C D
> colData names(1): Treatment
>>
>> cbind(sset, sset)
> dim: 5 8
> exptData(0):
> assays(0):
> rownames: NULL
> rowData metadata column names(0):
> colnames(8): A B ... C1 D1
> colData names(1): Treatment
>
> Upon cbind'ing, the assays in the SE object are lost. I think this is
> due to the fact that the cbind code matches up assays by their names.
> Thus, if there are no names, the code assumes that there are no assays.
>
> I guess this could be prevented by enforcing naming of assays in the
> SummarizedExperiment constructor. Or, the binding code could be modified
> to work positionally when there are no assay names, e.g., by cbind'ing
> the first assays across all SE objects, then the second assays, etc.
>
> Any thoughts?
>
> Regards,
>
> Aaron
>
>> sessionInfo()
> R Under development (unstable) (2014-12-14 r67167)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils
> datasets
> [8] methods   base
>
> other attached packages:
> [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13   IRanges_2.1.41
> [4] S4Vectors_0.5.21      BiocGenerics_0.13.6
>
> loaded via a namespace (and not attached):
> [1] XVector_0.7.4
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:15}}



More information about the Bioc-devel mailing list