[Bioc-devel] SummarizedExperiments

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Fri Sep 14 20:46:30 CEST 2012


On Fri, Sep 14, 2012 at 2:25 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
> For what it's worth I already wrote a CombineSEwithNAs() function to do this
> on the disjoint ranges for RRBS.  It assumes that there isn't any additional
> colData or elementMetadata of interest (for reasons that will become clear)
> and further assumes that the user will want to smooth.

I have one in bsseq as well (as I said earlier), but this would still
be nice to think about in the most general case possible.

> seqlevels, seqlengths, genome are implemented via seqinfo as of the latest
> GenomicRanges package (went in some time ago, thanks to MM):

There is something I am missing here.  It clearly works.  But
showMethods("genome") tells me that methods are defined for Any,
Seqinfo, but if I use
  example(SummarizedExperiment)
to get sset defined, I still get
  is(sset, "Seqinfo")
to be FALSE.  I thought this would check for inheritance.

> R> head(genome(LAML))
>   chr1  chr10  chr11  chr12  chr13  chr14
> "hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
> R> head(seqlengths(LAML))
>      chr1     chr10     chr11     chr12     chr13     chr14
> 249250621 135534747 135006516 133851895 115169878 107349540
>
> I wrote a trivial addSeqinfo(x) function that, given a genome, will populate
> an SE's seqinfo automatically from a BSgenome (if there is one).  The
> function calls
> rtracklayer:::SeqinfoForBSGenome(unique(na.omit(genome(x))))[seqlevels(x)]
> to get the correct information.
>
> I hate the fact that there can be NA or differing genomes specified
> per-chromosome for SummarizedExperiments.  It makes me sad.

I don't like you can mix hg18/hg19 but on the other hand we routinely
spike in lambda phage and that is not really part of the human genome.

>
>
>
> On Fri, Sep 14, 2012 at 10:54 AM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com> wrote:
>>
>> Thanks for all the additional methods.  I still miss
>>   seqlevels, seqlengths, genome
>>
>> Below,
>>
>> On Wed, Sep 12, 2012 at 3:33 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> > On 09/12/2012 12:15 PM, Kasper Daniel Hansen wrote:
>>
>> >> One thing I have in my package that I find indispensable is combine
>> >> and (my own) combineList.  The later for combining > 2 objects, which
>> >> has a lot of possibilities for speed up especially if (very common)
>> >> all the objects have the same rowData, as opposed to Reduce(combine,
>> >> LIST)..  Usecase: you need to add additional samples to your
>> >> SummarizedExperiment.
>> >
>> >
>> > I found it difficult in Biobase to write combine methods for eSet, where
>> > you're really requiring a lot from the user (about the phenoData /
>> > featureData structured in the same way) or going through contortions to
>> > make
>> > it the same in a reasonable-but-ad-hoc way (e.g., when two columns are
>> > factors with the same set of levels but encoded differently). Maybe the
>> > effort required is proportional to the utility of the function
>> > provided...
>> > I'll give it some more thought.
>>
>> In the abstract case it is hard to imagine combining different
>> SummarizedExperiments.  My usecase is almost always "additional
>> samples from the same experiment", and for that situation it is a lot
>> easier to imagine combining it.  You still need to check that the
>> granges are similar (and if not, expand some of the assayData with
>> zeroes or NA's), since the new samples may have coverage in locations
>> not assayed earlier.  Clearly factors are hard to handle and I assume
>> there are other hard to handle cases.  Nevertheless, I find such a
>> function incredibly useful.
>>
>> I think it is entirely ok to assume that the user knows what (s)he is
>> doing.
>>
>> Kasper
>
>
>
>
> --
> A model is a lie that helps you see the truth.
>
> Howard Skipper
>



More information about the Bioc-devel mailing list