[BioC] GO annotation inconsistency

James W. MacDonald jmacdon at med.umich.edu
Fri Aug 4 23:37:41 CEST 2006


Hi Daniel,

Daniel Gatti wrote:
> O/S: Windows XP
> R: 2.3.1
> Bioconductor: 1.8
>
> I'm trying to get a list of all probes in a given GO category.  In the 
> Bioconductor annotation libraries there are mapping from GO category to 
> probe ID and from probe ID to GO category.  I'm finding that they do not 
> match in terms of annotation.  Here's a sample script:
>
>  library(hgu95av2)
>  library(GO)
>  
>  # Get list of probe -> GO mappings.
>  hgu95av2GO.list = as.list(hgu95av2GO)
>  hgu95av2GO.list = lapply(hgu95av2GO.list, names)
>  
>  # Work with GO category 7031.
>  GO.7031.probes = unique(get("GO:0007031", hgu95av2GO2ALLPROBES))
>   

The problem here is that you are using the wrong environment. If you 
look at the man page for this env, you will see that this maps the GO 
term in question _and_ all of its children to the probe ID (you actually 
want the hgu95av2GO2PROBE environment). In contrast, the hgu95av2GO 
environment maps probe IDs only to the GO terms, excluding the children. 
If you use the correct environment, things work out.

 > library(hgu95av2)
 > hgu95av2GO.list = as.list(hgu95av2GO)
 >  hgu95av2GO.list = lapply(hgu95av2GO.list, names)
 >  GO.7031.probes = unique(get("GO:0007031", hgu95av2GO2PROBE))
 > length(GO.7031.probes)
[1] 11
 >
 > probe2GO.7031 = hgu95av2GO.list[match(GO.7031.probes,
+ names(hgu95av2GO.list))]
 >  length(grep("GO:0007031", probe2GO.7031))
[1] 11

HTH,

Jim


>  length(GO.7031.probes)
> [1] 16
>  probe2GO.7031 = hgu95av2GO.list[match(GO.7031.probes, 
> names(hgu95av2GO.list))]
>  length(grep("GO:0007031", probe2GO.7031))
> [1] 11
>
> Note that the GO -> probe list gives me 16 probes in category 7031 while 
> the probe -> GO list gives me 11 probes.  This happens for a lot of 
> categories.  Am I missing some key concept or is there something else 
> going on?
>
> Thanks,
> Dan Gatti
> UNC-CH
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>   

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list