[Bioc-devel] Genomics analysis results class

Luigi Marchionni marchion at jhu.edu
Wed Oct 7 22:58:59 CEST 2009


Thanks Vincent,
that was very helpful. I just installed GeneAnswers and see whether  
this can help.
I will am also pondering about your observations on genomic  
representation and will make use of rtracklayer classes.

For this package I wrote a lot of functions for reporting my GSEA-like  
results in HTML using htmlpage() from the annotation package, and to  
represent them in useful graphics, since what I want to 'brew' from  
the analysis is biological understanding.
When I designed the code I always thought about the biologists  
receiving back the analyses results (and I am an MD myself, so I have  
clear in my mind this perspective). The only side-effect is the large  
amount of bundled html I output....

I will sit down now and recap all methods, the ones I already coded,  
the ones that are still needed, and the like, as you suggested.
Finally, I do base the "gene set side" of the package on GSEABase  
classes.
In this perspective I only needed to extend the GeneSetCollection  
class, since I work with ~50,000 gene sets organized in what I call  
'functional scopes' (ie GO, KEGG, and so on).

I'll try to make up my mind and get back in touch to have further  
suggestions.
If there is a more appropriate way to do so than this mailing list,  
please let me know.

Thanks in advance,

Luigi

On Oct 7, 2009, at 4:00 PM, Vincent Carey wrote:

> The things that I remember doing that capture analysis results are
> some classes in MLInterfaces for dealing with classification and
> clustering, and in GGtools for managing eQTL search results.   One
> package that seems relevant to Luigi's concern is GeneAnswers, which
> is at 0.99, from the Northwestern U group.
>
> Some other things to think about -- you seem to be collecting
> quantitative information on genomic elements.  Will it be relevant to
> be able to display these in a genome browser?  If so, designing
> software that transforms this information into classes used for
> browsing with rtracklayer seems to make sense.  annaffy is a nice and
> fairly general approach to throwing textual information and links
> relevant to genes into HTML for interactive browsing.
>
> Finally, class design/reuse decisions are probably best conducted by
> spelling out the methods that need to be present.  Clearly a gene set
> structure is relevant, and sets and translations among nomenclatures
> are well-handled by GSEABase classes.  If you feel you will often need
> to filter results to those relevant to a pathway, a subsetting
> mechanism will be important to have.
>
> On Wed, Oct 7, 2009 at 3:06 PM, Luigi Marchionni <marchion at jhu.edu>  
> wrote:
>> Dear developers,
>> my name is Luigi Marchionni, I am Instructor in Oncology at Johns  
>> Hopkins.
>> I have ongoing research on various biological problems and I make  
>> use of
>> GSEA like methods for which I wrote my own package.
>> I'd like to upload that in Bioconductor and for this reason I am  
>> updating my
>> code to reuse as much as possible already implemented S4 classes.
>>
>> For the gene sets I am using GSEABase classes, however I did not  
>> find an
>> existing class for results from genomics analyses.
>> I thought in the first place to extend the MultiSet class from  
>> Biobase, but
>> this does not seem to be completely appropriate (I have results from
>> different platforms and technologies, i.e. microarray and SAGE).
>>
>> Right now I use a list of data.frames, each one storing the features
>> identifiers and the ranking statistics from the genomics analyses
>> (additional information could be obviously added in additional  
>> slots). See
>> the object ex.Stats below as an example:
>>
>>> class(ex.Stats)
>> [1] "list"
>>> str(ex.Stats)
>> List of 2
>>  $ ex.hgu133a :'data.frame':    100 obs. of  6 variables:
>>  ..$ ID       : chr [1:100] "200763_s_at" "200062_s_at" "208834_x_at"
>> "214631_at" ...
>>  ..$ logFC    : num [1:100] -8.09 -6.71 -5.92 5.97 5.82 ...
>>  ..$ t        : num [1:100] -43.8 -31 -30.5 28.8 27.3 ...
>>  ..$ P.Value  : num [1:100] 1.89e-96 2.85e-73 3.25e-72 1.49e-68  
>> 3.18e-65 ...
>>  ..$ adj.P.Val: num [1:100] 1.27e-92 9.57e-70 7.26e-69 2.50e-65  
>> 4.26e-62 ...
>>  ..$ B        : num [1:100] 204 154 152 144 136 ...
>>  $ ex.hgu95av2:'data.frame':    100 obs. of  6 variables:
>>  ..$ ID       : chr [1:100] "31957_r_at" "35777_at" "32412_at"  
>> "32539_at"
>> ...
>>  ..$ logFC    : num [1:100] -7.76 -6.93 -5.66 5.72 5.68 ...
>>  ..$ t        : num [1:100] -41 -31.3 -28.5 27 26 ...
>>  ..$ P.Value  : num [1:100] 7.05e-92 8.77e-74 8.49e-68 2.16e-64  
>> 3.29e-62 ...
>>  ..$ adj.P.Val: num [1:100] 4.73e-88 2.94e-70 1.90e-64 3.63e-61  
>> 4.41e-59 ...
>>  ..$ B        : num [1:100] 194 155 142 134 130 ...
>>
>> This works just fine, however, I do not want to reinvent the wheel...
>> I was discussing with Rafa and he suggested that before I go any  
>> further
>> with coding I inquire with you guys if anything like this already  
>> exists.
>> He said that Vincent Carey might have already something for that.
>> Is that true? Do you have any suggestion?
>>
>> I thank all of you in advance,
>>
>> Luigi Marchionni
>>
>> --
>> Ulisse: "Considerate la vostra semenza:
>> fatti non foste a viver come bruti,
>> ma per seguir virtute e canoscenza".
>> (Dante, Divina Commedia, Canto XXVI)
>> --
>>     G-C
>>    T---A    Luigi Marchionni, M.D., Ph.D.
>>   C----G    The Sidney Kimmel Comprehensive Cancer Center
>> G-------C    Johns Hopkins University -  School of Medicine
>>  A------T     1550 Orleans St., CRB2, Rm 554
>>    C----G    Baltimore, MD, 21231, USA
>>    G--C    Tel: (001) 410-502-8179
>>     C-G    Fax: (001) 410-502-5742
>>    T---A    e-mail: marchion at jhmi.edu
>>   G-----C    URL: http://astor.som.jhmi.edu/~marchion/
>>  A-------T
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>



More information about the Bioc-devel mailing list