[BioC] topGO question

Sun Sep 14 17:07:50 CEST 2008

Hi Heike,

I just looked at your data and the problem seems to be with geneList.
More exactly, the geneList and go_list don't match at all. The gene
identifiers in indexing the go_list are totally different from the
gene identifiers found in the geneList!

library(topGO)

## load the data
load("myGOdata.RData")

## the list of all gene ID's
geneNames <- names(go_list)
str(geneNames)

length(intersect(names(geneList), geneNames))   ## this is 0!

I think you mixed the annotations or the process of building the list
of interesting genes. There needs to be an overlap between the
identifiers in the GO-to-gene mapping and the list of interesting
genes. Bellow I generated a random set of interesting genes just to
test if one can build the topGOdata object based on the go_list:

## generate a random list of interesting  genes
## select (or define) the list of interesting genes
myInterestedGenes <- sample(geneNames, 100)

## make a indicator vector showing which genes are interesting
myGeneList <- factor(as.integer(geneNames %in% myInterestedGenes))
names(myGeneList) <- geneNames

str(myGeneList)
sum(as.integer(myGeneList) == 2) ## should be 100

## build the topGOdata class
## there are three annotation functions available:
##      1. annFUN.db  -- used for bioconductor annotation chips
##      2. annFUN.gene2GO  -- used when you have mappings from each gene to GOs
##      3. annFUN.GO2genes -- used when you have mappings from each GO to genes
##

GOdata <- new("topGOdata",
              ontology = "MF",
              allGenes = myGeneList,
              annot = annFUN.gene2GO,  ## the new annotation function
              gene2GO = go_list)      ## the gene ID to GO dataset

## display the GOdata object
GOdata

------------------------- topGOdata object -------------------------

 Description:
   -

 Ontology:
   -  MF

 20623 available genes (all genes from the array):
   - symbol:  TM00000001 TM00000002 TM00000003 TM00000004 TM00000005  ...
   - 100  significant genes.

 18098 feasible genes (genes that can be used in the analysis):
   - symbol:  TM00000001 TM00000002 TM00000003 TM00000004 TM00000005  ...
   - 90  significant genes.

 GO graph (nodes with at least  0  genes):
   - a graph with directed edges
   - number of nodes = 1556
   - number of edges = 1853

------------------------- topGOdata object -------------------------

sessionInfo()
R version 2.7.1 (2008-06-23)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US.ISO-8859-15;LC_NUMERIC=C;LC_TIME=en_US.ISO-8859-15;LC_COLLATE=en_US.ISO-8859-15;LC_MONETARY=C;LC_MESSAGES=en_US.ISO-8859-15;LC_PAPER=en_US.ISO-8859-15;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.ISO-8859-15;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] topGO_1.8.1         SparseM_0.78        GO.db_2.2.0
[4] AnnotationDbi_1.2.2 RSQLite_0.6-9       DBI_0.2-4
[7] Biobase_2.0.1       graph_1.18.1

loaded via a namespace (and not attached):
[1] cluster_1.11.11

Hope that this helps,
Adrian

On Sun, Sep 14, 2008 at 8:02 AM, Heike Pospisil
<pospisil at zbh.uni-hamburg.de> wrote:
> Hi Adrian,
>
> thanks for your reply.
>
> I passed the annFUN.gene2GO  and got  no errors.  allGo contains no NA and
> no NULL.
> go_list is a list of character vectors:
>
>> str(go_list2)
> List of 9
> $ TM00000001: chr [1:2] "GO:0009058" "GO:0016757"
> $ TM00000002: chr [1:38] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414"
> ...
> $ TM00000003: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000004: chr [1:5] "GO:0005634" "GO:0045449" "GO:0009943" "GO:0009947"
> ...
> $ TM00000005: chr [1:13] "GO:0016165" "GO:0040007" "GO:0006952" "GO:0009695"
> ...
> $ TM00000006: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000007: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000009: chr [1:42] "GO:0016301" "GO:0004672" "GO:0004674" "GO:0009733"
> ...
> $ TM00000010: chr [1:8] "GO:0009736" "GO:0005886" "GO:0004673" "GO:0009884"
> ...
>
> And, here is my sessionInfo:
>
>> sessionInfo()
> R version 2.7.2 (2008-08-25)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=de_DE.UTF-8;LC_ADDRESS=de_DE.UTF-8;LC_TELEPHONE=de_DE.UTF-8;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=de_DE.UTF-8
>
> attached base packages:
> [1] splines   grid      tcltk     tools     stats     graphics  grDevices
> [8] utils     datasets  methods   base
> other attached packages:
> [1] maizeprobe_2.2.0     matchprobes_1.12.1   maizecdf_2.2.0     [4]
> GO_2.2.0             topGO_1.8.1          SparseM_0.78       [7]
> biomaRt_1.14.1       RCurl_0.9-4          GOstats_2.6.0      [10]
> Category_2.6.0       genefilter_1.20.0    survival_2.34-1    [13]
> RBGL_1.16.0          annotate_1.18.0      xtable_1.5-3       [16]
> GO.db_2.2.0          AnnotationDbi_1.2.2  RSQLite_0.7-0      [19] DBI_0.2-4
>            graph_1.18.1         qvalue_1.14.0      [22] maanova_1.10.0
> arrayQuality_1.18.0  RColorBrewer_1.0-2 [25] gridBase_0.4-3
> hexbin_1.14.0        colorspace_0.95    [28] convert_1.16.0
> marray_1.18.0        tkWidgets_1.18.0   [31] DynDoc_1.18.0
>  widgetTools_1.16.0   statmod_1.3.6      [34] vsn_3.6.0
>  lattice_0.17-14      affy_1.18.2        [37] preprocessCore_1.2.1
> affyio_1.8.1         Biobase_2.0.1      [40] limma_2.14.6
> rkward_0.4.9
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 XML_1.96-0
>
> Would be very glad if you have any idea, what went wrong. Thanks,
> Heike
>
>
> Adrian Alexa schrieb:
>>
>> Hi Heike,
>>
>> it seems that there is a problem with the go_list object. It should be
>> a list of character vectors. However, it is hard to tell what is wrong
>> with just the information you provided. Please also post the session
>> info such that we know which version of the software are you using.
>>
>> The error is with the annFUN.gene2GO() function. if the go_list is
>> correct, than the following line should pass without error:
>>
>> go2genes <- annFUN.gene2GO(whichOnto = "MF", gene2GO = go_list)
>>
>> If you get an error here, can you post the results of the following lines:
>>
>> allGO = unlist(go_list, use.names = FALSE)
>> str(allGO)
>> sum(is.na(allGO))
>> sum(is.null(allGO))
>>
>>
>> Regards,
>> Adrian
>>
>>
>>
>>
>>
>>
>> On Fri, Sep 12, 2008 at 11:42 AM, Heike Pospisil
>> <pospisil at zbh.uni-hamburg.de> wrote:
>>
>>>
>>> Hello list,
>>>
>>> I am trying to use topGO for GO enrichment analysis. I have data from an
>>> array which is still not supported by BioC (maize array).
>>>
>>> I have a mapping of genes to GO terms named go_list:
>>>
>>> $TM00000001
>>> [1] "GO:0009058" "GO:0016757"
>>>
>>> $TM00000002
>>> [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563"
>>> [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651"
>>> [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803"
>>> [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713"
>>> [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528"
>>> [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901"
>>> [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744"
>>> [36] "GO:0009753" "GO:0009751" "GO:0010199"
>>>
>>> Moreover, the geneList is the named factor that indicates which genes are
>>> interested:
>>>
>>>>
>>>> str(geneList)
>>>>
>>>
>>> Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>>> - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003"
>>> "MZ00000004" ...
>>>
>>> I have used annFUN.gene2GO  as an annotation function:
>>>
>>>
>>> GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annFUN.gene2GO,gene2GO=go_list)
>>>
>>> Unfortunately,  I got the following error message:
>>> Building most specific GOs .....Error in order(allGO) : argument 1 is not
>>> a
>>> vector
>>>
>>> Does anybody have an idea what is wrong in my code?
>>>
>>> Thanks and best,
>>> Heike
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>>
>
>