[BioC] Using GOstats for a non-model organism

Maureen J. Donlin donlinmj at slu.edu
Mon Feb 14 23:50:18 CET 2011


James,

Thanks for the reply.  I figured out how to get the data into a data frame.
I was doing 2 things wrong, but here is the code that worked.

 > CneoGO <- read.table("Cneo_GOannot.txt", header=TRUE)
 > head(CneoGO)
       Goterm Evidence     GeneID
1 GO:0015893      IEA CNAG_00003
2 GO:0043231      IEA CNAG_00003
3 GO:0015203      IEA CNAG_00003
4 GO:0044425      IEA CNAG_00003
5 GO:0044444      IEA CNAG_00003
6 GO:0015846      IEA CNAG_00003

 > goframeData = data.frame(CneoGO$Goterm, CneoGO$Evidence, CneoGO$GeneID)
 > head(goframeData)
   CneoGO.Goterm CneoGO.Evidence CneoGO.GeneID
1    GO:0015893             IEA    CNAG_00003
2    GO:0043231             IEA    CNAG_00003
3    GO:0015203             IEA    CNAG_00003
4    GO:0044425             IEA    CNAG_00003
5    GO:0044444             IEA    CNAG_00003
6    GO:0015846             IEA    CNAG_00003

So continuing with the tutorial guide, I executed the following:

 > library("GSEABase")
Loading required package: annotate

 > goFrame = GOFrame(goframeData, organism = "Cryptococcus neoformans")
Loading required package: GO.db

 > goFrame
An object of class "GOFrame"
Slot "data":
       CneoGO.Goterm CneoGO.Evidence CneoGO.GeneID
1        GO:0015893             IEA    CNAG_00003
2        GO:0043231             IEA    CNAG_00003
...
Slot "organism":
[1] "Cryptococcus neoformans"

 > goAllFrame = GOAllFrame(goFrame)

 > goAllFrame
An object of class "GOAllFrame"
Slot "data":
             go_id evidence    gene_id
1      GO:0000001      IEA CNAG_00006
2      GO:0000001      IEA CNAG_00088
...
Slot "organism":
[1] "Cryptococcus neoformans"


 > gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
 > gsc
GeneSetCollection
   names: GO:0000001, GO:0000002, ..., GO:2000045 (6658 total)
   unique identifiers: CNAG_00006, CNAG_00088, ..., CNAG_06995 (4822 total)
   types in collection:
     geneIdType: GOAllFrameIdentifier (1 total)
     collectionType: GOCollection (1 total)

 > universe = Lkeys(CneoGO)
Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function "Lkeys", for 
signature "data.frame"

Am I missing some data that is found in the library("org.Hs.egGO")?  I 
can do the same commands with it and the structure of the goFrame, 
goAllFrame and gsc seem to be the same.

Here's what I am trying to do.  I have a microarray data set from a time 
course experiment done with a fungal genome, C. neoformans.  I have 
clusters of genes which are associated based how their expression 
changed in relation to the other genes on the array.  So what I have are 
gene lists, with no expression data or fold changes.  For each list of 
genes, I want to know what GO terms are over-represented.

I apologize if these questions are too basic.  It's just that most of 
the software out there for gene enrichment analysis are designed for 
model organisms.

Again, any help is greatly appreciated.

Regards,
Maureen





On 2/14/11 3:23 PM, James W. MacDonald wrote:
> Hi Maureen,
>
> On 2/14/2011 3:27 PM, Maureen J. Donlin wrote:
>> Hi all,
>>
>> I'm new to R and have some very basic questions about using GOstats with
>> a non-model organism.
>> I'm trying to use the tutorial by Marc Carlson "How to Use GOstats
>> and...with unsupported model organisms."
>>
>> I've created a GO to gene mapping file with the following 3 columns of
>> data:
>> Goterm Evidence GeneID
>> GO:0015893 IEA CNAG_00003
>> GO:0043231 IEA CNAG_00003
>> GO:0015203 IEA CNAG_00003
>> GO:0044425 IEA CNAG_00003
>> ...
>>
>> I can import it using read.table, but I don't seem to be able to invoke
>> the data frame correctly.
>
> When you read it in using read.table(), you automatically have a 
> data.frame.
>
>>
>> The tutorial reads:
>> library("org.Hs.eg.db")
>> frame = toTable(org.Hs.egGO)
>> goFrameData = data.frame(frame$go_id, frame$Evidence, frame$gene_id)
>
> Yep, this is just some code that Marc uses to create a data.frame so 
> he can give an example.
>
>>
>> I imported the data into an object using read.table
>> >CneoGOanno <- read.table("Cneo_GOannot.txt")
>>
>> I tried to create a frame using:
>> > frame = toTable(CneoGOannot)
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "toTable", for signature
>> "data.frame"
>>
>> Do I have to create some sort of database for this organism first? If
>> so, what is it's format?
>>
>> Any suggestions would be most appreciated.
>
> Just go to the next step, which will be something like
>
> goFrame <- GOFrame(CneoGOanno, organism = "Cryptococcus neoformans")
> goAllFrame <- GOALLFrame(goFrame)
>
>
> Best,
>
> Jim
>
>
>
>>
>> Regards,
>> Maureen Donlin
>>
>> At the risk of too long of an email, here's the session info:
>> > sessionInfo()
>> R version 2.12.1 (2010-12-16)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5
>> graph_1.28.0 Category_2.16.0 AnnotationDbi_1.12.0
>> [8] Biobase_2.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.2
>> RBGL_1.26.0 splines_2.12.1 survival_2.36-2 tools_2.12.1
>> [9] XML_3.2-0 xtable_1.5-6
>>
>>
>

-- 
Maureen J. Donlin, Ph.D.
Research Associate Professor

Dept. of Molecular Microbiology&  Immunology
Dept. of Biochemistry&  Molecular Biology
Saint Louis University School of Medicine
507 Doisy Research Center
1100 S. Grand
St. Louis, MO  63104
Phone: 314-977-8858



More information about the Bioconductor mailing list