[BioC] Problem with invalid GO term in HyperGResult object - NOT solved

Herve Pages hpages at fhcrc.org
Fri Sep 21 00:20:52 CEST 2007


Hi Jenny,

Jenny Drnevich wrote:
> Hi Herve,
> 
> Thanks for letting me know the new Windows binaries were available.
> However, I traced the problem to the new annotation package for
> ath1121501 - it is USELESS for GOstats testing!! Arabidopsis doesn't use
> EntrezIDs, but instead uses AGI locus identifier, given in the
> ath1121501ACCNUM environment. This environment used to give unique IDs
> for almost all probe sets, but now all of the listings are either
> "multiple" or NA:
> 
>> probes <- ls(ath1121501ACCNUM)
>> probes[1:10]
>  [1] "244901_at"   "244902_at"   "244903_at"   "244904_at"   "244905_at"
>  [6] "244906_at"   "244907_at"   "244908_at"   "244909_at"   "244910_s_at"
>> length(probes)
> [1] 22810
>> locusList <- unique(unlist(mget(probes, ath1121501ACCNUM)))
>> length(locusList)
> [1] 2
>> locusList
> [1] "multiple" NA
> 
> 
> I realize this is probably came from the Arabidopsis database used to
> create the annotation package, but is there any way to "fix" this, or to
> put up the older annotation package?  The old one had a few hundred
> "multiple" entries, but I figured it wouldn't matter that much to throw
> them out. Any other ideas for ways around this problem?

Thanks for reporting this! The ath1121501ACCNUM map is clearly broken and I will
investigate this. Note that there is a simple work around (for now). Here it
is.

The ath1121501ACCNUM map is in fact derived from the ath1121501MULTIHIT:
this one is the original mapping between probes and AGI locus identifiers.
But given that this mapping is many-to-many (hence the name "MULTIHIT")
some time ago someone has decided to also provide the ath1121501ACCNUM map
that would be the same mapping except for the probes that hit nultiple
AGI locus ids: in ath1121501ACCNUM those probes would be mapped to the
string "multiple".

I've checked the ath1121501MULTIHIT map and it looks OK:

> tt <- unlist(eapply(ath1121501MULTIHIT, length))
> table(tt)
tt
    1     2     3     4     5     6     7     8     9    10    12    14    16
21731   945    62    32    14    10     4     4     1     1     1     3     1
   17
    1
> tt[tt==17]
266769_s_at
         17
> ath1121501MULTIHIT[["266769_s_at"]]
 [1] "AT1G12935" "AT1G18930" "AT2G03080" "AT2G04130" "AT2G04140" "AT3G29440"
 [7] "AT3G42883" "AT3G44390" "AT3G47165" "AT3G50490" "AT4G12426" "AT4G17820"
[13] "AT4G20180" "AT5G26590" "AT5G28466" "AT5G33315" "AT5G56605"

You can compute the ath1121501ACCNUM yourself with:

> ath1121501ACCNUM <- eapply(ath1121501MULTIHIT,
                             function(ids) if (length(ids) > 1) "multiple" else ids)

Let me know if you find any other problem.

Thanks!
H.


> 
> Thanks,
> Jenny
> 
>> sessionInfo()
> R version 2.6.0 Under development (unstable) (2007-08-28 r42679)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>  [1] limma_2.11.13        ath1121501_1.99.10   GOstats_2.3.17
>  [4] Category_2.3.39      genefilter_1.15.11   survival_2.32
>  [7] RBGL_1.13.6          annotate_1.15.11     xtable_1.5-1
> [10] GO.db_1.99.3         AnnotationDbi_0.99.1 RSQLite_0.6-2
> [13] DBI_0.2-3            graph_1.15.20        Biobase_1.15.34
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.7     simpleaffy_2.13.01
> 
> 
> 
> 
> 
> Jenny Drnevich, Ph.D.
> 
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
> 
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
> 
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list