[BioC] topGO question
Adrian Alexa
adrian.alexa at gmail.com
Sun Sep 14 17:07:50 CEST 2008
Hi Heike,
I just looked at your data and the problem seems to be with geneList.
More exactly, the geneList and go_list don't match at all. The gene
identifiers in indexing the go_list are totally different from the
gene identifiers found in the geneList!
library(topGO)
## load the data
load("myGOdata.RData")
## the list of all gene ID's
geneNames <- names(go_list)
str(geneNames)
length(intersect(names(geneList), geneNames)) ## this is 0!
I think you mixed the annotations or the process of building the list
of interesting genes. There needs to be an overlap between the
identifiers in the GO-to-gene mapping and the list of interesting
genes. Bellow I generated a random set of interesting genes just to
test if one can build the topGOdata object based on the go_list:
## generate a random list of interesting genes
## select (or define) the list of interesting genes
myInterestedGenes <- sample(geneNames, 100)
## make a indicator vector showing which genes are interesting
myGeneList <- factor(as.integer(geneNames %in% myInterestedGenes))
names(myGeneList) <- geneNames
str(myGeneList)
sum(as.integer(myGeneList) == 2) ## should be 100
## build the topGOdata class
## there are three annotation functions available:
## 1. annFUN.db -- used for bioconductor annotation chips
## 2. annFUN.gene2GO -- used when you have mappings from each gene to GOs
## 3. annFUN.GO2genes -- used when you have mappings from each GO to genes
##
GOdata <- new("topGOdata",
ontology = "MF",
allGenes = myGeneList,
annot = annFUN.gene2GO, ## the new annotation function
gene2GO = go_list) ## the gene ID to GO dataset
## display the GOdata object
GOdata
------------------------- topGOdata object -------------------------
Description:
-
Ontology:
- MF
20623 available genes (all genes from the array):
- symbol: TM00000001 TM00000002 TM00000003 TM00000004 TM00000005 ...
- 100 significant genes.
18098 feasible genes (genes that can be used in the analysis):
- symbol: TM00000001 TM00000002 TM00000003 TM00000004 TM00000005 ...
- 90 significant genes.
GO graph (nodes with at least 0 genes):
- a graph with directed edges
- number of nodes = 1556
- number of edges = 1853
------------------------- topGOdata object -------------------------
sessionInfo()
R version 2.7.1 (2008-06-23)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_US.ISO-8859-15;LC_NUMERIC=C;LC_TIME=en_US.ISO-8859-15;LC_COLLATE=en_US.ISO-8859-15;LC_MONETARY=C;LC_MESSAGES=en_US.ISO-8859-15;LC_PAPER=en_US.ISO-8859-15;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.ISO-8859-15;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] topGO_1.8.1 SparseM_0.78 GO.db_2.2.0
[4] AnnotationDbi_1.2.2 RSQLite_0.6-9 DBI_0.2-4
[7] Biobase_2.0.1 graph_1.18.1
loaded via a namespace (and not attached):
[1] cluster_1.11.11
Hope that this helps,
Adrian
On Sun, Sep 14, 2008 at 8:02 AM, Heike Pospisil
<pospisil at zbh.uni-hamburg.de> wrote:
> Hi Adrian,
>
> thanks for your reply.
>
> I passed the annFUN.gene2GO and got no errors. allGo contains no NA and
> no NULL.
> go_list is a list of character vectors:
>
>> str(go_list2)
> List of 9
> $ TM00000001: chr [1:2] "GO:0009058" "GO:0016757"
> $ TM00000002: chr [1:38] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414"
> ...
> $ TM00000003: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000004: chr [1:5] "GO:0005634" "GO:0045449" "GO:0009943" "GO:0009947"
> ...
> $ TM00000005: chr [1:13] "GO:0016165" "GO:0040007" "GO:0006952" "GO:0009695"
> ...
> $ TM00000006: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000007: chr [1:38] "GO:0003700" "GO:0007275" "GO:0045449" "GO:0005634"
> ...
> $ TM00000009: chr [1:42] "GO:0016301" "GO:0004672" "GO:0004674" "GO:0009733"
> ...
> $ TM00000010: chr [1:8] "GO:0009736" "GO:0005886" "GO:0004673" "GO:0009884"
> ...
>
> And, here is my sessionInfo:
>
>> sessionInfo()
> R version 2.7.2 (2008-08-25)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_NAME=de_DE.UTF-8;LC_ADDRESS=de_DE.UTF-8;LC_TELEPHONE=de_DE.UTF-8;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTIFICATION=de_DE.UTF-8
>
> attached base packages:
> [1] splines grid tcltk tools stats graphics grDevices
> [8] utils datasets methods base
> other attached packages:
> [1] maizeprobe_2.2.0 matchprobes_1.12.1 maizecdf_2.2.0 [4]
> GO_2.2.0 topGO_1.8.1 SparseM_0.78 [7]
> biomaRt_1.14.1 RCurl_0.9-4 GOstats_2.6.0 [10]
> Category_2.6.0 genefilter_1.20.0 survival_2.34-1 [13]
> RBGL_1.16.0 annotate_1.18.0 xtable_1.5-3 [16]
> GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.7-0 [19] DBI_0.2-4
> graph_1.18.1 qvalue_1.14.0 [22] maanova_1.10.0
> arrayQuality_1.18.0 RColorBrewer_1.0-2 [25] gridBase_0.4-3
> hexbin_1.14.0 colorspace_0.95 [28] convert_1.16.0
> marray_1.18.0 tkWidgets_1.18.0 [31] DynDoc_1.18.0
> widgetTools_1.16.0 statmod_1.3.6 [34] vsn_3.6.0
> lattice_0.17-14 affy_1.18.2 [37] preprocessCore_1.2.1
> affyio_1.8.1 Biobase_2.0.1 [40] limma_2.14.6
> rkward_0.4.9
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 XML_1.96-0
>
> Would be very glad if you have any idea, what went wrong. Thanks,
> Heike
>
>
> Adrian Alexa schrieb:
>>
>> Hi Heike,
>>
>> it seems that there is a problem with the go_list object. It should be
>> a list of character vectors. However, it is hard to tell what is wrong
>> with just the information you provided. Please also post the session
>> info such that we know which version of the software are you using.
>>
>> The error is with the annFUN.gene2GO() function. if the go_list is
>> correct, than the following line should pass without error:
>>
>> go2genes <- annFUN.gene2GO(whichOnto = "MF", gene2GO = go_list)
>>
>> If you get an error here, can you post the results of the following lines:
>>
>> allGO = unlist(go_list, use.names = FALSE)
>> str(allGO)
>> sum(is.na(allGO))
>> sum(is.null(allGO))
>>
>>
>> Regards,
>> Adrian
>>
>>
>>
>>
>>
>>
>> On Fri, Sep 12, 2008 at 11:42 AM, Heike Pospisil
>> <pospisil at zbh.uni-hamburg.de> wrote:
>>
>>>
>>> Hello list,
>>>
>>> I am trying to use topGO for GO enrichment analysis. I have data from an
>>> array which is still not supported by BioC (maize array).
>>>
>>> I have a mapping of genes to GO terms named go_list:
>>>
>>> $TM00000001
>>> [1] "GO:0009058" "GO:0016757"
>>>
>>> $TM00000002
>>> [1] "GO:0003700" "GO:0007275" "GO:0005634" "GO:0009414" "GO:0016563"
>>> [6] "GO:0009737" "GO:0045449" "GO:0010072" "GO:0046982" "GO:0009651"
>>> [11] "GO:0009733" "GO:0009723" "GO:0009734" "GO:0048527" "GO:0042803"
>>> [16] "GO:0009867" "GO:0010150" "GO:0009825" "GO:0009908" "GO:0003713"
>>> [21] "GO:0051607" "GO:0009790" "GO:0010014" "GO:0048467" "GO:0030528"
>>> [26] "GO:0009741" "GO:0009735" "GO:0010089" "GO:0009834" "GO:0009901"
>>> [31] "GO:0009611" "GO:0008361" "GO:0009416" "GO:0009620" "GO:0009744"
>>> [36] "GO:0009753" "GO:0009751" "GO:0010199"
>>>
>>> Moreover, the geneList is the named factor that indicates which genes are
>>> interested:
>>>
>>>>
>>>> str(geneList)
>>>>
>>>
>>> Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>>> - attr(*, "names")= chr [1:56321] "MZ00000001" "MZ00000002" "MZ00000003"
>>> "MZ00000004" ...
>>>
>>> I have used annFUN.gene2GO as an annotation function:
>>>
>>>
>>> GOdata<-new("topGOdata",ontology="MF",allGenes=geneList,annot=annFUN.gene2GO,gene2GO=go_list)
>>>
>>> Unfortunately, I got the following error message:
>>> Building most specific GOs .....Error in order(allGO) : argument 1 is not
>>> a
>>> vector
>>>
>>> Does anybody have an idea what is wrong in my code?
>>>
>>> Thanks and best,
>>> Heike
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>>
>
>
More information about the Bioconductor
mailing list