[Bioc-devel] Genomics analysis results class
Luigi Marchionni
marchion at jhu.edu
Wed Oct 7 22:58:59 CEST 2009
Thanks Vincent,
that was very helpful. I just installed GeneAnswers and see whether
this can help.
I will am also pondering about your observations on genomic
representation and will make use of rtracklayer classes.
For this package I wrote a lot of functions for reporting my GSEA-like
results in HTML using htmlpage() from the annotation package, and to
represent them in useful graphics, since what I want to 'brew' from
the analysis is biological understanding.
When I designed the code I always thought about the biologists
receiving back the analyses results (and I am an MD myself, so I have
clear in my mind this perspective). The only side-effect is the large
amount of bundled html I output....
I will sit down now and recap all methods, the ones I already coded,
the ones that are still needed, and the like, as you suggested.
Finally, I do base the "gene set side" of the package on GSEABase
classes.
In this perspective I only needed to extend the GeneSetCollection
class, since I work with ~50,000 gene sets organized in what I call
'functional scopes' (ie GO, KEGG, and so on).
I'll try to make up my mind and get back in touch to have further
suggestions.
If there is a more appropriate way to do so than this mailing list,
please let me know.
Thanks in advance,
Luigi
On Oct 7, 2009, at 4:00 PM, Vincent Carey wrote:
> The things that I remember doing that capture analysis results are
> some classes in MLInterfaces for dealing with classification and
> clustering, and in GGtools for managing eQTL search results. One
> package that seems relevant to Luigi's concern is GeneAnswers, which
> is at 0.99, from the Northwestern U group.
>
> Some other things to think about -- you seem to be collecting
> quantitative information on genomic elements. Will it be relevant to
> be able to display these in a genome browser? If so, designing
> software that transforms this information into classes used for
> browsing with rtracklayer seems to make sense. annaffy is a nice and
> fairly general approach to throwing textual information and links
> relevant to genes into HTML for interactive browsing.
>
> Finally, class design/reuse decisions are probably best conducted by
> spelling out the methods that need to be present. Clearly a gene set
> structure is relevant, and sets and translations among nomenclatures
> are well-handled by GSEABase classes. If you feel you will often need
> to filter results to those relevant to a pathway, a subsetting
> mechanism will be important to have.
>
> On Wed, Oct 7, 2009 at 3:06 PM, Luigi Marchionni <marchion at jhu.edu>
> wrote:
>> Dear developers,
>> my name is Luigi Marchionni, I am Instructor in Oncology at Johns
>> Hopkins.
>> I have ongoing research on various biological problems and I make
>> use of
>> GSEA like methods for which I wrote my own package.
>> I'd like to upload that in Bioconductor and for this reason I am
>> updating my
>> code to reuse as much as possible already implemented S4 classes.
>>
>> For the gene sets I am using GSEABase classes, however I did not
>> find an
>> existing class for results from genomics analyses.
>> I thought in the first place to extend the MultiSet class from
>> Biobase, but
>> this does not seem to be completely appropriate (I have results from
>> different platforms and technologies, i.e. microarray and SAGE).
>>
>> Right now I use a list of data.frames, each one storing the features
>> identifiers and the ranking statistics from the genomics analyses
>> (additional information could be obviously added in additional
>> slots). See
>> the object ex.Stats below as an example:
>>
>>> class(ex.Stats)
>> [1] "list"
>>> str(ex.Stats)
>> List of 2
>> $ ex.hgu133a :'data.frame': 100 obs. of 6 variables:
>> ..$ ID : chr [1:100] "200763_s_at" "200062_s_at" "208834_x_at"
>> "214631_at" ...
>> ..$ logFC : num [1:100] -8.09 -6.71 -5.92 5.97 5.82 ...
>> ..$ t : num [1:100] -43.8 -31 -30.5 28.8 27.3 ...
>> ..$ P.Value : num [1:100] 1.89e-96 2.85e-73 3.25e-72 1.49e-68
>> 3.18e-65 ...
>> ..$ adj.P.Val: num [1:100] 1.27e-92 9.57e-70 7.26e-69 2.50e-65
>> 4.26e-62 ...
>> ..$ B : num [1:100] 204 154 152 144 136 ...
>> $ ex.hgu95av2:'data.frame': 100 obs. of 6 variables:
>> ..$ ID : chr [1:100] "31957_r_at" "35777_at" "32412_at"
>> "32539_at"
>> ...
>> ..$ logFC : num [1:100] -7.76 -6.93 -5.66 5.72 5.68 ...
>> ..$ t : num [1:100] -41 -31.3 -28.5 27 26 ...
>> ..$ P.Value : num [1:100] 7.05e-92 8.77e-74 8.49e-68 2.16e-64
>> 3.29e-62 ...
>> ..$ adj.P.Val: num [1:100] 4.73e-88 2.94e-70 1.90e-64 3.63e-61
>> 4.41e-59 ...
>> ..$ B : num [1:100] 194 155 142 134 130 ...
>>
>> This works just fine, however, I do not want to reinvent the wheel...
>> I was discussing with Rafa and he suggested that before I go any
>> further
>> with coding I inquire with you guys if anything like this already
>> exists.
>> He said that Vincent Carey might have already something for that.
>> Is that true? Do you have any suggestion?
>>
>> I thank all of you in advance,
>>
>> Luigi Marchionni
>>
>> --
>> Ulisse: "Considerate la vostra semenza:
>> fatti non foste a viver come bruti,
>> ma per seguir virtute e canoscenza".
>> (Dante, Divina Commedia, Canto XXVI)
>> --
>> G-C
>> T---A Luigi Marchionni, M.D., Ph.D.
>> C----G The Sidney Kimmel Comprehensive Cancer Center
>> G-------C Johns Hopkins University - School of Medicine
>> A------T 1550 Orleans St., CRB2, Rm 554
>> C----G Baltimore, MD, 21231, USA
>> G--C Tel: (001) 410-502-8179
>> C-G Fax: (001) 410-502-5742
>> T---A e-mail: marchion at jhmi.edu
>> G-----C URL: http://astor.som.jhmi.edu/~marchion/
>> A-------T
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
More information about the Bioc-devel
mailing list