[BioC] "all" category in annotation data

Herve Pages hpages at fhcrc.org
Sat Aug 11 03:59:12 CEST 2007


Hi Michael,

Michael Newton wrote:
> I'm seeking advice on the use of the "all" component in various
> annotation data packages relative to GO.
> 
> Using R version 2.4.1 and (e.g.) hgu133plus version 1.14.0,

Thanks for reporting this problem!

Any reason why you don't use the current Bioconductor release? You need R-2.5.x
for this. Our metdata packages are updated every 6 months with each new release.
As Jim said, in more recent hgu133plus, the "all" entry has been removed
from the GO2ALLPROBES map.

One important property of hgu133plus2GO2ALLPROBES[["all"]] is that it should
_normally_ return all the probes that are mapped to at least 1 GO term.
This is because GO term "all" is the parent of all GO terms.

I've looked at hgu133plus2 1.14.0, and something is obviously wrong with it:

> library(hgu133plus2)
> length(unique(hgu133plus2GO2ALLPROBES[["all"]]))
[1] 30223
> probe_is_unmapped <- eapply(hgu133plus2GO, function(x) isTRUE(is.na(x)))
> probes_hitting_GO <- names(probe_is_unmapped)[!unlist(probe_is_unmapped)]
> length(probes_hitting_GO)
[1] 35836

The 2 above results should match :-/

Also hgu133plus2GO2ALLPROBES[["all"]] should contain the same probes as
the union of

  bp_probes <- hgu133plus2GO2ALLPROBES[["GO:0008150"]] # biological process
  mf_probes <- hgu133plus2GO2ALLPROBES[["GO:0003674"]] # molecular function
  cc_probes <- hgu133plus2GO2ALLPROBES[["GO:0005575"]] # cellular component

but it's apparently not the case:

> length(unique(c(bp_probes, mf_probes, cc_probes)))
[1] 35836

In fact this union contains the same probes as 'probes_hitting_GO' (which is
good):

> setequal(unique(c(bp_probes, mf_probes, cc_probes)), probes_hitting_GO)
[1] TRUE
> setequal(bp_probes, hgu133plus2GO2ALLPROBES[["all"]])
[1] TRUE

This only confirms what you've reported below: that hgu133plus2GO2ALLPROBES[["all"]]
is incomplete (it only contains the "BP probes" i.e. the probes that are mapped to
at least 1 GO term under the biological process ontology).

Please consider using hgu133plus2 1.16.0 instead (included in our current release).
The "all" key has been removed from the hgu133plus2GO2ALLPROBES map so I won't say
that the problem has been fixed but at least it has disappeared ;-).

We are currently in the process of reworking the way we produce our metadata packages
with the ambitious goal to make them better. So any breakage in the current packages
that people report to us is of great value and will help us to achieve our goal.

Thanks again for the feedback!

Cheers,
H.


> 
> library(hgu133plus2)   ## an Affy data package
> x <- as.list( hgu133plus2GO2ALLPROBES )  ##probe sets for each GO term
> 
> xa <- unique( x[["all"]] )    ## holds probe sets associated to "all"
> 
> xbp <- unique( x[["GO:0008150"]] )    # biological process
> xmf <- unique( x[["GO:0003674"]] )    # molecular function
> xcc <- unique( x[["GO:0005575"]] )    # cellular component
> 
> ## note that the following is true
> 
> all( xa == xbp )
> 
> But further checks show that the molecular function probe sets are not
> a subset of "all".
> 
> I was under the impression that "all" is the union of MF, BP, and CC,
> but in the few libraries I've checked, "all" equals BP.  I haven't
> found a discussion of the matter in the few vignettes that might be
> relevant.
> 
> Is "all" really "BP", or is it supposed to be the union?
> 
> thanks,
> 
> -Michael N.
>



More information about the Bioconductor mailing list