[Bioc-devel] requirement for named assays in SummarizedExperiment
Valerie Obenchain
vobencha at fredhutch.org
Wed Mar 11 19:45:12 CET 2015
Hi,
After talking with others the vote was against enforcing names on
assays() and for positional matching if all names are NULL. A mixture of
names and NULL throws an error.
example(SummarizedExperiment)
## all named
> se2 = se1
> assays(cbind(se1, se2))
List of length 1
names(1): counts
## mixture of names and NULL -> error
> names(assays(se1)) = NULL
> assays(cbind(se1, se2))
Error in assays(cbind(se1, se2)) :
error in evaluating the argument 'x' in selecting a method for
function 'assays': Error in .bind.arrays(args, cbind, "assays") :
elements in ‘assays’ must have the same names
## all NULL -> positional matching
> names(assays(se2)) = NULL
> assays(cbind(se1, se2))
List of length 1
If we find common use cases where positional matching is needed with a
mixture of names and NULL we can always relax this constraint.
Changes are in 1.19.46.
Valerie
On 03/06/2015 08:20 AM, Valerie Obenchain wrote:
> Hi Aaron,
>
> Thanks for catching this.
>
> I favor enforcing names in 'assays'. Combining by position alone is too
> dangerous. I'm thinking of the VCF class where the genome information is
> stored in 'assays' and the fields are rarely in the same order.
>
> Looks like we also need a more informative error message when names
> don't match.
>
> > assays(se1)
> List of length 1
> names(1): counts1
>
> > assays(se2)
> List of length 1
> names(1): counts2
>
> > cbind(se1, se2)
> Error in sQuote(accessorName) :
> argument "accessorName" is missing, with no default
>
>
> Valerie
>
>
> On 03/05/2015 11:09 PM, Aaron Lun wrote:
>> Dear all,
>>
>> I stumbled upon some unexpected behaviour with cbind'ing
>> SummarizedExperiment objects with unnamed assays:
>>
>>> require(GenomicRanges)
>>> nrows <- 5; ncols <- 4
>>> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
>>> rowData <- GRanges("chr1", IRanges(1:nrows, 1:nrows))
>>> colData <- DataFrame(Treatment=1:ncols, row.names=LETTERS[1:ncols])
>>> sset <- SummarizedExperiment(counts, rowData=rowData, colData=colData)
>>> sset
>> class: SummarizedExperiment
>> dim: 5 4
>> exptData(0):
>> assays(1): ''
>> rownames: NULL
>> rowData metadata column names(0):
>> colnames(4): A B C D
>> colData names(1): Treatment
>>>
>>> cbind(sset, sset)
>> dim: 5 8
>> exptData(0):
>> assays(0):
>> rownames: NULL
>> rowData metadata column names(0):
>> colnames(8): A B ... C1 D1
>> colData names(1): Treatment
>>
>> Upon cbind'ing, the assays in the SE object are lost. I think this is
>> due to the fact that the cbind code matches up assays by their names.
>> Thus, if there are no names, the code assumes that there are no assays.
>>
>> I guess this could be prevented by enforcing naming of assays in the
>> SummarizedExperiment constructor. Or, the binding code could be modified
>> to work positionally when there are no assay names, e.g., by cbind'ing
>> the first assays across all SE objects, then the second assays, etc.
>>
>> Any thoughts?
>>
>> Regards,
>>
>> Aaron
>>
>>> sessionInfo()
>> R Under development (unstable) (2014-12-14 r67167)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats4 parallel stats graphics grDevices utils
>> datasets
>> [8] methods base
>>
>> other attached packages:
>> [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13 IRanges_2.1.41
>> [4] S4Vectors_0.5.21 BiocGenerics_0.13.6
>>
>> loaded via a namespace (and not attached):
>> [1] XVector_0.7.4
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:15}}
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109
Email: vobencha at fredhutch.org
Phone: (206) 667-3158
More information about the Bioc-devel
mailing list