[BioC] problem with GO terms

James W. MacDonald jmacdon at med.umich.edu
Tue Nov 22 18:39:18 CET 2011


Hi Ina,

On 11/22/2011 12:19 PM, Ina Hoeschele wrote:
> Hi,
>    I have done a simple analysis associating GO terms with a gene list using GOstats. Then when I try to retrieve all genes belonging to a significant GO category I get zero genes ! I use this code:
> library(biomaRt)
> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl")
> temp<- getBM(attributes="entrezgene", filters="go", values=GOID[g], mart=mart)

You don't give sessionInfo(), so I have no idea why this is happening 
(remember to always supply this in the future!). However, you don't need 
to use biomaRt for this.

 > library(GO.db)
Loading required package: AnnotationDbi
Loading required package: Biobase

Welcome to Bioconductor

   Vignettes contain introductory material. To view, type
   'browseVignettes()'. To cite Bioconductor, see
   'citation("Biobase")' and for packages 'citation("pkgname")'.

Loading required package: DBI

 > library(org.Hs.eg.db)
 > get("GO:0050864", org.Hs.egGO2EG)
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
   value for "GO:0050864" not found

So for the current version of the GO.db, this GO term no longer exists, 
which is probably the problem you are having with biomaRt as well. 
However if you got this GO term from GOstats, then it will exist in your 
version of these packages.

As an example of what you should expect:

 > get("GO:0007597", org.Hs.egGO2EG)
    TAS    IDA    TAS    TAS    TAS    TAS    TAS    TAS    TAS     
IC    TAS
    "2"  "350"  "708"  "710" "2147" "2157" "2158" "2159" "2160" "2161" 
"2161"
    TAS    TAS    TAS    TAS    TAS    TAS    TAS    TAS
"2811" "2812" "2814" "2815" "3818" "3827" "5547" "7450"

 > sessionInfo()
R version 2.14.0 beta (2011-10-17 r57293)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] org.Hs.eg.db_2.6.4   GO.db_2.6.1          RSQLite_0.10.0
[4] DBI_0.2-5            AnnotationDbi_1.16.4 Biobase_2.13.12

loaded via a namespace (and not attached):
[1] IRanges_1.11.32

Best,

Jim



>
> length(temp$entrezgene) is zero!!
>
> GOID[g=1] = "GO:0050864", so as long as this is a valid GO ID (as returned from GOstats), length(temp$entrezgene) should not be zero!?
>
> This happens for multiple of my top 105 GO (BP, CC, MF) categories.
>
> Thanks for any hint ...
>
> Ina
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list