[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

Michael Lawrence lawrence.michael at gene.com
Sat Oct 11 23:17:00 CEST 2014


But what it would do exactly?

Probably would want to be able to extract a gene list from a TxDb, then
extract the desired type of structure from the TxDb.

Not too bad right now, but it would be nice to leverage the identifier type
information on the gene list object.

Currently:
tx <- transcripts(txdb, vals=list(gene_id=genes))

Proposed:
tx <- transcripts(txdb[GeneList])



On Sat, Oct 11, 2014 at 10:49 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:

> On 10/11/2014 08:41 AM, Vincent Carey wrote:
>
>> Is there anything on the order of as([GeneSet], "GRanges") around?
>>
>
> no, I don't think so; obviously of use and following a common theme. Martin
>
>
>
>> On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.gabe at gene.com>
>> wrote:
>>
>>  Sean and Vincent,
>>>
>>> The goal of what we are doing builds off of what Martin has in GSEABase.
>>> We were looking to see how much benefit we can get with something
>>> lighter-weight that lies between indistinguishable character vectors and
>>> the full machinery of GeneSets.
>>>
>>> Either way, it seems like formalizing the semantic information is a way
>>> to
>>> do what you want. Furthermore, these classed id objects can be created
>>> automatically when there is contextual information e.g. during queries to
>>> databases (or db-like objects), and then simply added to metadata
>>> DataFrames and re-used.
>>>
>>> ~G
>>>
>>>
>>>
>>>
>>> On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov>
>>> wrote:
>>>
>>>
>>>>
>>>> On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com>
>>>> wrote:
>>>>
>>>>  Hey all,
>>>>>
>>>>> We are in the (very) early stages of experimenting with something that
>>>>> seems relevant here: classed identifiers. We are using them for
>>>>> database/mart queries, but the same concept could be useful for the
>>>>> cases
>>>>> you're describing I think.
>>>>>
>>>>> E.g.
>>>>>
>>>>>  mysyms = GeneSymbol(c("BRAF", "BRCA1"))
>>>>>> mysyms
>>>>>>
>>>>> An object of class "GeneSymbol"
>>>>> [1] "BRAF"  "BRCA1"
>>>>>
>>>>>> yourSE[mysyms, ]
>>>>>>
>>>>> ...
>>>>>
>>>>>
>>>>>  This approach has the flavor of some of the functionality that Martin
>>>> put
>>>> together for the GSEABase package (EntrezIdentifier, etc.).
>>>>
>>>> Sean
>>>>
>>>>
>>>>
>>>>
>>>>> This approach has the benefit of being declarative instead of heuristic
>>>>> (people won't be able to accidentally invoke it), while still giving
>>>>> most
>>>>> of the convenience I believe you are looking for.
>>>>>
>>>>> The object classes inherit directly from character, so should "just
>>>>> work"
>>>>> most of the time, but as I said it's early days; lots more testing for
>>>>> functionality and usefulness is needed.
>>>>>
>>>>> ~G
>>>>>
>>>>>
>>>>> On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey <
>>>>> stvjc at channing.harvard.edu>
>>>>> wrote:
>>>>>
>>>>>  OK by me to leave [ alone.  We could start with subsetByEntrez,
>>>>>> subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID.
>>>>>>
>>>>>> Utilities to generate GRanges for queries in each of these
>>>>>> vocabularies
>>>>>> should, perhaps, be in the OrganismDb space?  Once those are in place
>>>>>> no additional infrastructure is necessary?
>>>>>>
>>>>>> On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <
>>>>>>
>>>>> tim.triche at gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  Agreed with Sean, having tried implementing to "magical" alternative
>>>>>>>
>>>>>>> --t
>>>>>>>
>>>>>>>  On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>>
>>>>>>>> Hi, Vince.
>>>>>>>>
>>>>>>>> I'm coming a little late to the party, but I agree with Kasper's
>>>>>>>>
>>>>>>> sentiment
>>>>>>>
>>>>>>>> that the less "magical" approach of using subsetByXXX might be the
>>>>>>>>
>>>>>>> cleaner
>>>>>>>
>>>>>>>> way to go for the time being.
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
>>>>>>>>
>>>>>>> stvjc at channing.harvard.edu>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>  https://github.com/vjcitn/biocMultiAssay/blob/master/
>>>>> vignettes/SEresolver.Rnw
>>>>>
>>>>>>
>>>>>>>>> shows some modifications to [ that allow subsetting of SE by
>>>>>>>>> gene or pathway name
>>>>>>>>>
>>>>>>>>> it may be premature to work at the [ level.  Kasper suggested
>>>>>>>>>
>>>>>>>> defining
>>>>>
>>>>>> a suite of subsetBy operations that would accomplish this
>>>>>>>>>
>>>>>>>>> i think we could get something along these lines into the release
>>>>>>>>>
>>>>>>>> without
>>>>>>>
>>>>>>>> too much more work.  votes?
>>>>>>>>>
>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Computational Biologist
>>>>> Genentech Research
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Computational Biologist
>>> Genentech Research
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list