Wait!  No!  I am a moron!  It works fine, I just forgot to indicate which
list element to retrieve from:

R> head(assays(SE.by.chr$chr1)$weights)
6 x 35855 sparse Matrix of class "dgCMatrix"
   [[ suppressing 82 column names ‘feat1’, ‘feat2’, ‘feat3’ ... ]]



feat1 1 . .         .         .          .          . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
feat2 . 1 .         .         .          .          . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
feat3 . . 1.0000000 0.5737488 .          .          . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
feat4 . . 0.5737488 1.0000000 .          .          . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
feat5 . . .         .         1.00000000 0.04300373 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
feat6 . . .         .         0.04300373 1.00000000 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .

feat1 . . . . . . . . . . . . . . ......
feat2 . . . . . . . . . . . . . . ......
feat3 . . . . . . . . . . . . . . ......
feat4 . . . . . . . . . . . . . . ......
feat5 . . . . . . . . . . . . . . ......
feat6 . . . . . . . . . . . . . . ......

 .....suppressing columns in show(); maybe adjust 'options(max.print= *)'
 ..............................



On Wed, Sep 5, 2012 at 10:31 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote:

> It seems to work with a Matrix in the assays:
>
> R> packageVersion('GenomicRanges')
> [1] ‘1.9.61’
>
> R> SE <- SummarizedExperiment(assays=wts, rowData=wts.locs)
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "SummarizedExperiment",
> for signature "dgCMatrix"
>
> R> SE <- SummarizedExperiment(assays=SimpleList(weights=wts),
> rowData=wts.locs)
> dimnames(.) <- NULL:  translated to
> dimnames(.) <- list(NULL,NULL)  <==>  unname(.)
>
> R> show(SE)
> class: SummarizedExperiment
> dim: 35855 35855
> exptData(0):
> assays(1): weights
> rownames(35855): feat1 feat2 ... feat35854 feat35855
> rowData metadata column names(10): egid signed ... DHS_DGF chromHMM_state
> colnames(35855): feat1 feat2 ... feat35854 feat35855
> colData names(0):
>
> R> class(assays(SE, 'weights'))
> [1] "SimpleList"
> attr(,"package")
> [1] "IRanges"
>
> R> class(assays(SE)$weights)
> [1] "dgCMatrix"
> attr(,"package")
> [1] "Matrix"
>
>
> Very cool!  Albeit a tad disorienting.  I was going to patch this up and
> then realized, hey, if someone's going to stuff a Matrix into their assays,
> they ought to give it a name and know what's going on under the hood, at
> least for now.  It seems to work correctly, too.
>
> Oh oh, I spoke too soon.  If I split by chromosome, the assay falls out:
>
> R> head(assays(SE)$weights)
> 6 x 35855 sparse Matrix of class "dgCMatrix"
>    [[ suppressing 82 column names ‘feat1’, ‘feat2’, ‘feat3’ ... ]]
>
>
>
> feat1 1 . .         .         .          .          . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
> feat2 . 1 .         .         .          .          . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
> feat3 . . 1.0000000 0.5737488 .          .          . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
> feat4 . . 0.5737488 1.0000000 .          .          . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
> feat5 . . .         .         1.00000000 0.04300373 . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
> feat6 . . .         .         0.04300373 1.00000000 . . . . . . . . . . .
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> . . . . . . . . . . . . .
>
> feat1 . . . . . . . . . . . . . . ......
> feat2 . . . . . . . . . . . . . . ......
> feat3 . . . . . . . . . . . . . . ......
> feat4 . . . . . . . . . . . . . . ......
> feat5 . . . . . . . . . . . . . . ......
> feat6 . . . . . . . . . . . . . . ......
>
>  .....suppressing columns in show(); maybe adjust 'options(max.print= *)'
>  ..............................
> R> SE.by.chr <- split(SE, as.vector(seqnames(SE)))
> R> head(assays(SE.by.chr)$weights)
> Error in head(assays(SE.by.chr)$weights) :
>   error in evaluating the argument 'x' in selecting a method for function
> 'head': Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "assays", for signature
> "list"
>
> Ideas?  I would love to do something useful here instead of just fussing.
>  It's really a great structure.
>
> I love the use of a reference class to reference the internal pieces of an
> object, it's such a clever hack.  Thanks much for adding it.
>
> --t
>
>
>
> On Tue, Sep 4, 2012 at 7:58 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote:
>
>> On 09/04/2012 07:02 PM, Tim Triche, Jr. wrote:
>>
>>> This is really cool.
>>>
>>> R> head(lsos(), 1)
>>>                              Type     Size  Rows Columns
>>> LAML.RNAseq SummarizedExperiment 38666528 23529     173
>>> R> LAML.RNAseq.updated <- updateObject(LAML.RNAseq)
>>> R> head(lsos(), 2)
>>>                                      Type     Size  Rows Columns
>>> LAML.RNAseq         SummarizedExperiment 38666528 23529     173
>>> LAML.RNAseq.updated SummarizedExperiment  6101488 23529     173
>>>
>>
>> I think either the sizes are misleading (missing the content of the
>> environment that the reference classes contain, probably
>> LAML.RNAseq.updated@assays$**data has size 38666528 - 6101488) or the
>> data has not been updated correctly :\. Nonetheless manipulation of
>> especially non-assay but also assay data should be faster.
>>
>> I added a brief section (1.9.60) to ?SummarizedExperiment describing what
>> was done. Also, I tried (w/out actually testing...) to add support for
>> Matrix instances.
>>
>> As a miscellaneous 'tip', when working with large data I find it useful
>> to start R with --min-vsize=2048M --min-nsize=20M. These are documented in
>> RShowDoc("R-intro") as 'for expert use only' and influence how much memory
>> R allocates for vectors ('vsize'; vsize can be very large) and for
>> S-expressions ('nsize' 50000000) and help get up to large memory
>> allocations without innumerable garbage collections. Of course having big
>> memory is of primary importance.
>>
>> Martin
>>
>>  R> str(LAML.RNAseq.updated@assays**)
>>> Reference class 'ShallowSimpleListAssays' [package "GenomicRanges"] with
>>> 1 fields
>>>   $ data:Formal class 'SimpleList' [package "IRanges"] with 4 slots
>>>    .. ..@ listData       :List of 1
>>>    .. .. ..$ rpm: num [1:23529, 1:173] 0 0 8.12 10.84 27.73 ...
>>>    .. ..@ elementType    : chr "ANY"
>>>    .. ..@ elementMetadata: NULL
>>>    .. ..@ metadata       : list()
>>>   and 11 methods,R>
>>> R> packageVersion('GenomicRanges'**)
>>> [1] ‘1.9.59’
>>>
>>> A slightly bigger one:
>>>
>>> R> head(lsos(),2)
>>>                               Type       Size   Rows Columns
>>> LAML         SummarizedExperiment 1950606472 485577     192
>>> LAML.updated SummarizedExperiment  458912760 485577     192
>>> R> system.time(LAML.updated$foo <- rep('bar', 192))
>>>     user  system elapsed
>>>    0.132   0.116   0.248
>>> R> system.time(LAML$foo <- rep('bar', 192))
>>>     user  system elapsed
>>>    2.728   2.144   7.519
>>>
>>>
>>> That is a really clever hack you added.  I think I will have to figure
>>> out how it works :-)
>>>
>>> Thanks!!!
>>>
>>>
>>> On Sun, Sep 2, 2012 at 2:47 PM, Martin Morgan <mtmorgan@fhcrc.org
>>> <mailto:mtmorgan@fhcrc.org>> wrote:
>>>
>>>     On 08/30/2012 08:47 AM, Vincent Carey wrote:
>>>
>>>         I am all in favor of the dialogue and the aim.  I reproduced
>>>         your first
>>>         timing, and note
>>>
>>>             unix.time(ss2@colData$tx <- 1:1000)
>>>
>>>              user  system elapsed
>>>             5.963   4.520  10.483
>>>
>>>             unix.time(colData(ss2)$tx2 <- 1:1000)
>>>
>>>              user  system elapsed
>>>            17.937  13.074  31.016
>>>
>>>         BTW did you really mean to put the meth/unmeth in exptData, or
>>>         should it be
>>>         in assays?
>>>
>>>              From an experimentation perspective, I would want to redo
>>> these
>>>
>>>         computations above with an environment holding the assay data
>>>         and see how
>>>         it changes.  I suppose one can infer from first principles that
>>>
>>>             ss4@assays[[1]]$meth[1:4,1:4]
>>>
>>>                [,1] [,2] [,3] [,4]
>>>         [1,]  0.2  0.2  0.2  0.2
>>>         [2,]  0.2  0.2  0.2  0.2
>>>         [3,]  0.2  0.2  0.2  0.2
>>>         [4,]  0.2  0.2  0.2  0.2
>>>
>>>             ss4@assays[[2]]$unmeth[1:4,1:_**_4]
>>>
>>>
>>>                [,1] [,2] [,3] [,4]
>>>         [1,]  0.2  0.2  0.2  0.2
>>>         [2,]  0.2  0.2  0.2  0.2
>>>         [3,]  0.2  0.2  0.2  0.2
>>>         [4,]  0.2  0.2  0.2  0.2
>>>
>>>             unix.time(colData(ss4)$tx2 <- 1:1000)
>>>
>>>              user  system elapsed
>>>             0.010   0.002   0.012
>>>
>>>         where ss4 has a SimpleList of environments holding the assay
>>> data.
>>>
>>>             ss4
>>>
>>>         class: SummarizedExperiment
>>>         dim: 450000 1000
>>>         exptData(0):
>>>         assays(2): '' ''
>>>         rownames: NULL
>>>         rowData values names(0):
>>>         colnames: NULL
>>>         colData names(4): sampleID treatment tx tx2
>>>
>>>             ss4@assays[[1]]
>>>
>>>         <environment: 0x1f4e0ba0>
>>>
>>>         This can't be attempted without disabling validity checking, and
>>>         the stack
>>>         trace from the attempt is instructive.
>>>
>>>         I don't think it will be too difficult to establish a customized
>>>         SummarizedExperiment that permits this, using the
>>> lockedEnvironment
>>>         approach of ExpressionSets.  I also do not know the down sides
>>>         but it seems
>>>         you/we could get some mileage on the approach and see.
>>>
>>>
>>>     GenomicRanges v. 1.9.59 has some preliminary changes that address
>>>     the performance issue. Serialized (saved) objects 'x' need to be
>>>     updated with updateObject(x) (and be careful if these are your only
>>>     instances of some hard work). There are some loose ends to tidy up.
>>>
>>>     Martin
>>>
>>>
>>>
>>>
>>>         On Thu, Aug 30, 2012 at 10:53 AM, Kasper Daniel Hansen <
>>>         kasperdanielhansen@gmail.com
>>>         <mailto:kasperdanielhansen@**gmail.com<kasperdanielhansen@gmail.com>>>
>>> wrote:
>>>
>>>             My perspective (and the reason for me email) is pretty
>>> simple.
>>>
>>>             First off, let me say that we will always be working on
>>>             trading off
>>>             memory and IO and development time and all the other stuff.
>>>               I mean,
>>>             clearly the environment stuff for ExpressionSets did not
>>>             help all
>>>             people, because we have xps, aroma.affymetrix and oligo all
>>>             using some
>>>             kind of file based backend.
>>>
>>>             I am not trying to solve something "big".  Putting
>>>             methylation data
>>>             inside an environment is not suddenly going to make my
>>>             algorithms go
>>>             _significantly_ faster or the memory usage to go
>>>             _significantly_ down.
>>>                But it makes life faster when you explore data - I
>>> believe.
>>>
>>>             Now for two realistic examples (one minfi, one bsseq) that I
>>>             encounter
>>>             all the time:
>>>             library(GenomicRanges)
>>>             nRows <- 450000
>>>             nCols <- 1000
>>>             SSet <- SummarizedExperiment(rowData = GRanges(seqnames =
>>>             rep("chr1",
>>>             nRows),
>>>                                            ranges = IRanges(1:nRows,
>>>             width = 1)),
>>>                                            colData = DataFrame(sampleID =
>>>             paste0("person", 1:nCols)),
>>>                                            exptData = SimpleList(meth =
>>>             matrix(0.2,
>>>             ncol = nCols, nrow = nRows),
>>>                                            unmeth = matrix(0.2, ncol =
>>>             nCols, nrow =
>>>             nRows)))
>>>             print(object.size(SSet), units = "auto")  # 6.7GB
>>>             system.time({
>>>                   colData(SSet)$treatment = rep(c("a", "b"), nCols / 2)
>>>             }) # around 26 sec
>>>
>>>
>>>             nRows <- 28000000
>>>             nCols <- 10
>>>             SSet <- SummarizedExperiment(rowData = GRanges(seqnames =
>>>             rep("chr1",
>>>             nRows),
>>>                                            ranges = IRanges(1:nRows,
>>>             width = 1)),
>>>                                            colData = DataFrame(sampleID =
>>>             paste0("person", 1:nCols)),
>>>                                            exptData = SimpleList(meth =
>>>             matrix(0.2,
>>>             ncol = nCols, nrow = nRows),
>>>                                            unmeth = matrix(0.2, ncol =
>>>             nCols, nrow =
>>>             nRows)))
>>>             print(object.size(SSet), units = "auto")  # 4.4GB
>>>             system.time({
>>>                   colData(SSet)$treatment = rep(c("a", "b"), nCols / 2)
>>>             }) # around 20 sec
>>>
>>>             This is of course not that slow, especially not when
>>>             compared to the
>>>             massive amount of computation I had to do in order to arrive
>>>             at the
>>>             SSet in the first place.  But the difference is that I
>>>             staring at the
>>>             screen when I work interactively, but I am out partying when
>>>             I run my
>>>             computations (overnight).  And I do stuff like the above
>>>             surprisingly
>>>             many times after I have generated the objects.
>>>
>>>             My perspective is that this was essentially "solved" with the
>>>             ExpressionSet class, and I feel we are regressing a bit.  I
>>> also
>>>             believe that it should be "easy" to re-use some of the
>>>             ideas/code from
>>>             ExpressionSet and that it is worthwhile to do so, for
>>>             something that
>>>             may very well be an extremely important core class.
>>>             SummarizedExperiment is such a nice class that I essentially
>>>             have a
>>>             parallel implementation in bsseq and Pete has one in genoset.
>>>
>>>             So essentially, I was interested in fixing a pretty small
>>>             problem, but
>>>             something that I am a bit irritated about.  And I feel we
>>>             almost have
>>>             a ready to use solution.
>>>
>>>             The reason I ask about it here, and not just do it myself,
>>>             is twofold
>>>             (1) it is tricky to get the environment right, and I do not
>>> have
>>>             enough experience on the environment part of ExpressionSet,
>>>             that I
>>>             feel comfortable doing it
>>>             (2) I really think it would be beneficial to the community,
>>> and
>>>             experience tells us, that what happens in these core
>>>             infrastructure
>>>             packages are much more likely to be widely used.
>>>
>>>             While I think that it may be worthwhile for people to play
>>> with
>>>             reference classes, I think that going down that route is
>>>             unexplored
>>>             and may or may not be worth it.  There is something to be
>>>             said for
>>>             building on something we have experience with.  A reference
>>>             class idea
>>>             can be developed in parallel, if people are willing to
>>>             commit the
>>>             time.
>>>
>>>             Kasper
>>>
>>>
>>>             On Thu, Aug 30, 2012 at 10:12 AM, Tim Triche, Jr.
>>>             <tim.triche@gmail.com <mailto:tim.triche@gmail.com>>
>>>
>>>             wrote:
>>>
>>>                    I am interested in how best to benchmark such
>>>                 operations (and another
>>>                 thought that I recently had, "why can't I put a Doug
>>>                 Bates style 'Matrix'
>>>                 in the SimpleList of assay data)?  Kasper is the person
>>>                 who convinced me
>>>                 that ff, rhdf5, etc.should be a last resort, since RAM
>>>                 is cheap.
>>>
>>>                 For large data, the TCGA BRCA samples are a hair under
>>>                 800 (last I
>>>
>>>             looked)
>>>
>>>                 450k arrays with matching RNAseq and DNAseq, and that's
>>>                 not even a big
>>>                 study by breast cancer standards (IIRC, Gordon Mills was
>>>                 rounding up 5000
>>>                 tumors) Both minfi and methylumi can work off of the raw
>>>                 data in these
>>>                 studies as a "practical" example, but when it really
>>>                 gets interesting is
>>>                 when 3-4 ragged assays exist for each patient.
>>>
>>>                 Exon-wise (or splice-graph-wise) RNAseq data with splice
>>>                 junction read
>>>                 counts is another place to find big SE-centric data with
>>>                 ambiguities.
>>>
>>>             One
>>>
>>>                 of the reasons I keep coming back to the idea of
>>>                 subsetting an SE by a
>>>                 GRanges[List], e.g. SE.totalRNA[ UCSC.lincRNAs, ], is
>>>                 its speed. In any
>>>                 event, after my obligations for the day are dispatched,
>>>                 I'll write up an
>>>                 example of doing this off of TCGA data, perhaps from
>>>                 colorectal (only
>>>
>>>             about
>>>
>>>                 400 subjects but at least it is published already).  You
>>>                 are right, a
>>>                 single use case is often worth 1000 specifications.
>>>
>>>                 --t
>>>
>>>
>>>
>>>                 On Thu, Aug 30, 2012 at 6:42 AM, Vincent Carey
>>>                 <stvjc@channing.harvard.edu
>>>                 <mailto:stvjc@channing.**harvard.edu<stvjc@channing.harvard.edu>
>>> >>__wrote:
>>>
>>>
>>>
>>>
>>>                     On Thu, Aug 30, 2012 at 9:16 AM, Sean Davis
>>>                     <sdavis2@mail.nih.gov <mailto:sdavis2@mail.nih.gov>>
>>>
>>>
>>>             wrote:
>>>
>>>
>>>
>>>
>>>                         On Thu, Aug 30, 2012 at 8:59 AM, Martin Morgan
>>>                         <mtmorgan@fhcrc.org <mailto:mtmorgan@fhcrc.org>
>>>
>>>
>>>                 wrote:
>>>
>>>
>>>                             On 08/30/2012 04:42 AM, Vincent Carey wrote:
>>>
>>>                                 On Thu, Aug 30, 2012 at 6:27 AM, Tim
>>>                                 Triche, Jr. <
>>>
>>>             tim.triche@gmail.com <mailto:tim.triche@gmail.com>
>>>
>>>
>>>                                     wrote:
>>>
>>>
>>>                                    nb.  one of the reasons for the
>>>                                 existence of the MergedDataSet
>>>
>>>             class in
>>>
>>>                                     regulatoR (to be submitted for
>>>                                     review shortly) is that, while SEs
>>>
>>>             are
>>>
>>>                                     absolutely fantastic for managing
>>>                                     data matrices that are stapled to
>>>
>>>             a
>>>
>>>                                     GRanges, what is less awesome is
>>>                                     having a relatively light-weight
>>>                                     DataFrame
>>>                                     for phenotypic data that requires
>>>                                     the entire memory footprint be
>>>                                     recreated
>>>                                     upon writing a new column into said
>>>                                     DataFrame.
>>>
>>>                                     If R5 classes didn't spook me a
>>>                                     little, I would already have done
>>>                                     something
>>>
>>>
>>>                                 We don't often use this R5 terminology
>>>                                 but I see Hadley has made an
>>>                                 accessible document referring to
>>>                                 reference classes in this way.
>>>
>>>
>>>                             To me the challenge is more conceptual --
>>>                             pass-by-reference and the
>>>
>>>             way
>>>
>>>                             that two variables pointing to the instance
>>>                             are updated at the same
>>>
>>>             time --
>>>
>>>                             and I had been thinking of a
>>>                             LockedEnvironment-style implementation
>>>
>>>             where
>>>
>>>                             some operations were free ('copying') but
>>>                             others weren't (subset,
>>>
>>>             subset
>>>
>>>                             assign). But maybe there are some more
>>>                             direct approaches...
>>>
>>>
>>>                                My 2c: This is a situation where some
>>>                             experimental data would be
>>>
>>>                                 helpful.
>>>
>>>
>>>
>>>                         Perhaps Tim or Kasper can share a largish
>>>                         (unpublished) dataset and a
>>>                         typical workflow with the Seattle folks.  Even
>>>                         that level of detail
>>>
>>>             would
>>>
>>>                         give some sense of the scope of the problem.
>>>
>>>
>>>
>>>                     For clarification, I did not mean large data,
>>>                     although that would be
>>>                     welcome.  I meant data on computational experiments
>>>                     with different
>>>                     approaches -- that is, performance statistics that
>>>                     would indicate where
>>>                     scalability fails, what steps have been taken to
>>>                     recover it, and how
>>>
>>>             costly
>>>
>>>                     those steps are in terms of complexity, reliability,
>>>                     maintainability,
>>>
>>>             risk
>>>
>>>                     at the user/application end.
>>>
>>>
>>>
>>>                         Sean
>>>
>>>
>>>                             Yes, for instance where in the interactive
>>>                             use is time being spent? Is
>>>                             it copying the assays, or validity, or
>>>                             actually updating the row
>>>
>>>             data? Is
>>>
>>>                             500000 x 800 an appropriate scale to be
>>>                             thinking about?
>>>
>>>
>>>                                  The main avenues for a developer seem
>>>                             to be a) use environments or
>>>
>>>                                 reference classes; there are some costs
>>>                                 and we should understand
>>>
>>>             them,
>>>
>>>                                 and
>>>                                 b) use an out-of-memory approach like
>>>                                 rhdf5 or ff.  Again there will
>>>
>>>             be
>>>
>>>                                 some costs.  It should be relatively
>>>                                 easy to experiment with these.
>>>
>>>                One
>>>
>>>                                 thing I just learned about is
>>>                                 setValidity2 and disableValidity
>>>
>>>             (defined
>>>
>>>                                 in
>>>                                 IRanges IIRC) ... these allow you to
>>>                                 construct certain variations on
>>>                                 SummarizedExperiment with less attention
>>>                                 to deeper infrastructure.
>>>
>>>
>>>                             probably I can make better use of the
>>>                             insights the IRanges guys have
>>>
>>>             had
>>>
>>>                             in their careful development and application
>>>                             of validity methods,
>>>
>>>             though I
>>>
>>>                             feel a bit like these are 'attractive
>>>                             hazards' that tempt us to do
>>>
>>>             unsafe
>>>
>>>                             things and then pay the price later. This is
>>>                             likely the first
>>>
>>>             direction
>>>
>>>                             I'll explore.
>>>
>>>                             Exploring a little I already see that there
>>>                             are some pretty dumb
>>>
>>>             things
>>>
>>>                             being done in assignment.
>>>
>>>                             Martin
>>>
>>>
>>>                                whereby the assays/libraries for a given
>>>                             study subject are all
>>>
>>>             pointed
>>>
>>>                                     to
>>>                                     as SEs (i.e. RNAseq, BSseq,
>>>                                     expression/methylation arrays,
>>> CNV/SNP
>>>                                     arrays,
>>>                                     WGS or exomic DNAseq) and the column
>>>                                     (phenotype) data can avoid
>>>
>>>             being
>>>
>>>                                     subject to these constraints.  Truth
>>>                                     be told I *still* want to do
>>>
>>>             that
>>>
>>>                                     because, most of the time, updates
>>>                                     to the latter are independent of,
>>>                                     and
>>>                                     happen subsequently to, loading the
>>>                                     former.
>>>
>>>                                     Suggestions would be welcome,
>>>                                     because other than these minor
>>>
>>>             niggles,
>>>
>>>                                     the
>>>                                     SummarizedExperiment class is almost
>>>                                     perfect for many tasks.
>>>
>>>
>>>
>>>
>>>                                     On Wed, Aug 29, 2012 at 9:57 PM, Tim
>>>                                     Triche, Jr. <
>>>
>>>             tim.triche@gmail.com <mailto:tim.triche@gmail.com>
>>>
>>>
>>>
>>>                                         wrote:
>>>
>>>
>>>                                        assigning new colData columns, or
>>>                                     overwriting old ones, in a
>>>
>>>             sizable
>>>
>>>                                         (say
>>>                                         500000 row x 800 column) SE is
>>>                                         nauseatingly slow.
>>>
>>>
>>>
>>>
>>>                                         There has to be a better way --
>>>                                         I'm willing to write it if
>>> someone
>>>
>>>             can
>>>
>>>                                         point out an obvious way to do it
>>>
>>>
>>>
>>>                                         On Wed, Aug 29, 2012 at 9:52 PM,
>>>                                         Kasper Daniel Hansen <
>>>                                         kasperdanielhansen@gmail.com
>>>                                         <mailto:kasperdanielhansen@**
>>> gmail.com <kasperdanielhansen@gmail.com>>>
>>>
>>>                                         wrote:
>>>
>>>                                            On Thu, Aug 30, 2012 at 12:44
>>>                                         AM, Martin Morgan <
>>>
>>>             mtmorgan@fhcrc.org <mailto:mtmorgan@fhcrc.org>>
>>>
>>>
>>>                                             wrote:
>>>
>>>                                                 On 08/29/2012 06:46 PM,
>>>                                                 Kasper Daniel Hansen
>>> wrote:
>>>
>>>
>>>                                                     There is a lot of
>>>                                                     good stuff to say
>>>                                                     about
>>>
>>> SummarizedExperiments,
>>>                                                     and
>>>                                                     from a certain point
>>>                                                     of view I have a
>>>                                                     parallel
>>>                                                     implementation in
>>>
>>>                                                 bsseq
>>>
>>>
>>>                                            (and there is also one in
>>>                                         genoset).
>>>
>>>
>>>                                                     However, I really
>>>                                                     like having the
>>>                                                     assayData inside an
>>>
>>>             environment.
>>>
>>>                                                     This helps some on
>>>                                                     memory and - equally
>>>                                                     important - speed at
>>> the
>>>                                                     command line.  I
>>>                                                     certainly need to
>>>                                                     very heavily
>>>                                                     consider using
>>>
>>>             an
>>>
>>>                                                     environment in bsseq.
>>>
>>>                                                     After some
>>>                                                     discussion with Tim
>>>                                                     (Triche) we have
>>>                                                     agreed that
>>>                                                     something
>>>                                                     like
>>>
>>> SummarizedExperiments is
>>>                                                     the way to go at
>>>                                                     least for the
>>>                                                     methylation arrays.
>>>                                                       We need to be able
>>>                                                     to easily handle
>>> 1000s
>>>
>>>             of
>>>
>>>                                                     samples.
>>>
>>>                                                     What is the chance
>>>                                                     that we can get the
>>>                                                     option of having the
>>>                                                     assayData
>>>                                                     inside an
>>>                                                     environment, perhaps
>>> by
>>>                                                           Making a class
>>>                                                     that is an
>>>                                                     environment and
>>>                                                     inherits from
>>>
>>>                                                 SimpleList.
>>>
>>>
>>>                                                Using a classUnion
>>>                                         between the existing class of
>>>                                         the assayData
>>>
>>>                                                     and
>>>                                                     an environment.
>>>                                                           Third option
>>>                                                     that is probably
>>>                                                     better than the
>>>                                                     proceeding
>>>
>>>             two,
>>>
>>>                                                     but
>>>                                                     which I cannot come
>>>                                                     up with right now.
>>>
>>>
>>>
>>>                                                 Probably something can /
>>>                                                 will be done. I guess
>>>                                                 the slowness
>>>
>>>             you're
>>>
>>>
>>>                                             talking
>>>
>>>                                                 about is when rowData /
>>>                                                 colData columns are
>>>                                                 manipulated; any
>>>
>>>             kind of
>>>
>>>                                                 subsetting would mean a
>>>                                                 'deep' copy. Martin
>>>
>>>
>>>                                             Yes, for example
>>>                                             manipulating colData -
>>>                                             something that
>>>
>>>             conceptually
>>>
>>>                                             should be quick and easy.
>>>                                               Of course, this will not
>>>                                             affect any
>>>
>>>             real
>>>
>>>                                             computation on the assayData
>>>                                             matrices, but it will make
>>>                                             life at
>>>
>>>             the
>>>
>>>                                             command prompt more pleasant.
>>>
>>>                                             Kasper
>>>
>>>
>>>
>>>
>>>                                                     This would - in my
>>>                                                     opinion - be very
>>>                                                     nice and worthwhile.
>>>
>>>                                                     Kasper
>>>
>>>
>>> ______________________________**__**_________________
>>>
>>> Bioc-devel@r-project.org
>>>                                                     <mailto:
>>> Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>
>>>                                                     mailing list
>>>
>>> https://stat.ethz.ch/mailman/***__*listinfo/bioc-devel<https://stat.ethz.ch/mailman/*__*listinfo/bioc-devel>
>>>                                                     <
>>> https://stat.ethz.ch/mailman/****listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>
>>> ><
>>>
>>>             https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>             <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>>
>>>
>>>
>>>
>>>                                                 --
>>>                                                 Computational Biology /
>>>                                                 Fred Hutchinson Cancer
>>>                                                 Research Center
>>>                                                 1100 Fairview Ave. N.
>>>                                                 PO Box 19024 Seattle, WA
>>>                                                 98109
>>>
>>>                                                 Location: Arnold
>>>                                                 Building M1 B861
>>>                                                 Phone: (206) 667-2793
>>>
>>> <tel:%28206%29%20667-2793>
>>>
>>>
>>>
>>> ______________________________**__**_________________
>>>                                             Bioc-devel@r-project.org
>>>                                             <mailto:
>>> Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>
>>>                                             mailing list
>>>
>>> https://stat.ethz.ch/mailman/***__*listinfo/bioc-devel<https://stat.ethz.ch/mailman/*__*listinfo/bioc-devel>
>>>                                             <
>>> https://stat.ethz.ch/mailman/****listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>
>>> ><
>>>
>>>             https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>             <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>>                                         --
>>>                                         *A model is a lie that helps you
>>>                                         see the truth.*
>>>                                         *
>>>                                         *
>>>                                         Howard Skipper<
>>>
>>>                                     http://cancerres.aacrjournals.**
>>> __**org/content/31/9/1173.**full.__pdf<
>>>
>>>             http://cancerres.aacrjournals.**
>>> __org/content/31/9/1173.full.**pdf
>>>             <http://cancerres.**aacrjournals.org/content/31/9/**
>>> 1173.full.pdf<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>> >__>
>>>
>>>
>>>                                     **>
>>>
>>>
>>>
>>>
>>>
>>>                                     --
>>>                                     *A model is a lie that helps you see
>>>                                     the truth.*
>>>                                     *
>>>                                     *
>>>                                     Howard Skipper<
>>>                                     http://cancerres.aacrjournals.**
>>> __**org/content/31/9/1173.**full.__pdf<
>>>
>>>             http://cancerres.aacrjournals.**
>>> __org/content/31/9/1173.full.**pdf
>>>             <http://cancerres.**aacrjournals.org/content/31/9/**
>>> 1173.full.pdf<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>> >__>
>>>
>>>
>>>                                     **>
>>>
>>>                                                [[alternative HTML
>>>                                     version deleted]]
>>>
>>>                                     ______________________________**
>>> __**_________________
>>>                                     Bioc-devel@r-project.org
>>>                                     <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>
>>> >
>>>                                     mailing list
>>>                                     https://stat.ethz.ch/mailman/***
>>> __*listinfo/bioc-devel<https://stat.ethz.ch/mailman/*__*listinfo/bioc-devel>
>>>                                     <https://stat.ethz.ch/mailman/**
>>> **listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>
>>> ><
>>>
>>>             https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>             <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>>
>>>
>>>
>>>                                           [[alternative HTML version
>>>                                 deleted]]
>>>
>>>                                 ______________________________**
>>> __**_________________
>>>                                 Bioc-devel@r-project.org
>>>                                 <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>
>>> >
>>>                                 mailing list
>>>                                 https://stat.ethz.ch/mailman/***
>>> __*listinfo/bioc-devel<https://stat.ethz.ch/mailman/*__*listinfo/bioc-devel>
>>>                                 <https://stat.ethz.ch/mailman/**
>>> **listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>
>>> ><
>>>
>>>             https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>             <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>>
>>>
>>>
>>>
>>>                             --
>>>                             Computational Biology / Fred Hutchinson
>>>                             Cancer Research Center
>>>                             1100 Fairview Ave. N.
>>>                             PO Box 19024 Seattle, WA 98109
>>>
>>>                             Location: Arnold Building M1 B861
>>>                             Phone: (206) 667-2793<tel:%28206%29%20667-2793>
>>>
>>>                             ______________________________**
>>> __**_________________
>>>                             Bioc-devel@r-project.org
>>>                             <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>>
>>> mailing list
>>>                             https://stat.ethz.ch/mailman/***
>>> __*listinfo/bioc-devel<https://stat.ethz.ch/mailman/*__*listinfo/bioc-devel>
>>>                             <https://stat.ethz.ch/mailman/**
>>> **listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel>
>>> ><
>>>
>>>             https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>             <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                 --
>>>                 *A model is a lie that helps you see the truth.*
>>>                 *
>>>                 *
>>>                 Howard Skipper<
>>>
>>>             http://cancerres.aacrjournals.**
>>> __org/content/31/9/1173.full.**pdf
>>>             <http://cancerres.**aacrjournals.org/content/31/9/**
>>> 1173.full.pdf<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>> >__>
>>>
>>>
>>>                           [[alternative HTML version deleted]]
>>>
>>>                 ______________________________**___________________
>>>                 Bioc-devel@r-project.org
>>>                 <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>>
>>> mailing list
>>>                 https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>                 <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >
>>>
>>>
>>>
>>>                  [[alternative HTML version deleted]]
>>>
>>>         ______________________________**___________________
>>>         Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>
>>> >
>>>         mailing list
>>>         https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>         <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> >
>>>
>>>
>>>
>>>     --
>>>     Computational Biology / Fred Hutchinson Cancer Research Center
>>>     1100 Fairview Ave. N.
>>>     PO Box 19024 Seattle, WA 98109
>>>
>>>     Location: Arnold Building M1 B861
>>>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>>>
>>>
>>>
>>>
>>> --
>>> /A model is a lie that helps you see the truth./
>>> /
>>> /
>>> Howard Skipper
>>> <http://cancerres.**aacrjournals.org/content/31/9/**1173.full.pdf<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>> >
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>
>


-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

	[[alternative HTML version deleted]]

