[Bioc-devel] requirement for named assays in SummarizedExperiment

Thu Mar 12 16:12:04 CET 2015

What he said

This doesn't make any sense from an API perspective.  When would a user ever expect to see unnamed assay matrices?

--t

> On Mar 12, 2015, at 7:46 AM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
> 
> allowing positional matching strikes me as being far too fragile.
> Depending on the actual implementation, it may not even be clear there is
> an order of the assays.
> 
> On Wed, Mar 11, 2015 at 2:45 PM, Valerie Obenchain <vobencha at fredhutch.org>
> wrote:
> 
>> Hi,
>> 
>> After talking with others the vote was against enforcing names on assays()
>> and for positional matching if all names are NULL. A mixture of names and
>> NULL throws an error.
>> 
>> example(SummarizedExperiment)
>> 
>> ## all named
>>> se2 = se1
>>> assays(cbind(se1, se2))
>> List of length 1
>> names(1): counts
>> 
>> ## mixture of names and NULL -> error
>>> names(assays(se1)) = NULL
>>> assays(cbind(se1, se2))
>> Error in assays(cbind(se1, se2)) :
>>  error in evaluating the argument 'x' in selecting a method for function
>> 'assays': Error in .bind.arrays(args, cbind, "assays") :
>>  elements in ‘assays’ must have the same names
>> 
>> ## all NULL -> positional matching
>>> names(assays(se2)) = NULL
>>> assays(cbind(se1, se2))
>> List of length 1
>> 
>> If we find common use cases where positional matching is needed with a
>> mixture of names and NULL we can always relax this constraint.
>> 
>> Changes are in 1.19.46.
>> 
>> Valerie
>> 
>> 
>> 
>> 
>>> On 03/06/2015 08:20 AM, Valerie Obenchain wrote:
>>> 
>>> Hi Aaron,
>>> 
>>> Thanks for catching this.
>>> 
>>> I favor enforcing names in 'assays'. Combining by position alone is too
>>> dangerous. I'm thinking of the VCF class where the genome information is
>>> stored in 'assays' and the fields are rarely in the same order.
>>> 
>>> Looks like we also need a more informative error message when names
>>> don't match.
>>> 
>>>> assays(se1)
>>> List of length 1
>>> names(1): counts1
>>> 
>>>> assays(se2)
>>> List of length 1
>>> names(1): counts2
>>> 
>>>> cbind(se1, se2)
>>> Error in sQuote(accessorName) :
>>>   argument "accessorName" is missing, with no default
>>> 
>>> 
>>> Valerie
>>> 
>>> 
>>>> On 03/05/2015 11:09 PM, Aaron Lun wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> I stumbled upon some unexpected behaviour with cbind'ing
>>>> SummarizedExperiment objects with unnamed assays:
>>>> 
>>>> require(GenomicRanges)
>>>>> nrows <- 5; ncols <- 4
>>>>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>>>>> rowData <- GRanges("chr1", IRanges(1:nrows, 1:nrows))
>>>>> colData <- DataFrame(Treatment=1:ncols, row.names=LETTERS[1:ncols])
>>>>> sset <- SummarizedExperiment(counts, rowData=rowData, colData=colData)
>>>>> sset
>>>> class: SummarizedExperiment
>>>> dim: 5 4
>>>> exptData(0):
>>>> assays(1): ''
>>>> rownames: NULL
>>>> rowData metadata column names(0):
>>>> colnames(4): A B C D
>>>> colData names(1): Treatment
>>>> 
>>>>> 
>>>>> cbind(sset, sset)
>>>> dim: 5 8
>>>> exptData(0):
>>>> assays(0):
>>>> rownames: NULL
>>>> rowData metadata column names(0):
>>>> colnames(8): A B ... C1 D1
>>>> colData names(1): Treatment
>>>> 
>>>> Upon cbind'ing, the assays in the SE object are lost. I think this is
>>>> due to the fact that the cbind code matches up assays by their names.
>>>> Thus, if there are no names, the code assumes that there are no assays.
>>>> 
>>>> I guess this could be prevented by enforcing naming of assays in the
>>>> SummarizedExperiment constructor. Or, the binding code could be modified
>>>> to work positionally when there are no assay names, e.g., by cbind'ing
>>>> the first assays across all SE objects, then the second assays, etc.
>>>> 
>>>> Any thoughts?
>>>> 
>>>> Regards,
>>>> 
>>>> Aaron
>>>> 
>>>> sessionInfo()
>>>> R Under development (unstable) (2014-12-14 r67167)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> 
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> 
>>>> attached base packages:
>>>> [1] stats4    parallel  stats     graphics  grDevices utils
>>>> datasets
>>>> [8] methods   base
>>>> 
>>>> other attached packages:
>>>> [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13   IRanges_2.1.41
>>>> [4] S4Vectors_0.5.21      BiocGenerics_0.13.6
>>>> 
>>>> loaded via a namespace (and not attached):
>>>> [1] XVector_0.7.4
>>>> 
>>>> 
>>>> ______________________________________________________________________
>>>> The information in this email is confidential and inte...{{dropped:15}}
>>> 
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
>> 
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, Seattle, WA 98109
>> 
>> Email: vobencha at fredhutch.org
>> Phone: (206) 667-3158
>> 
>> 
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
>    [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel