[BioC] Problem with invalid GO term in HyperGResult object - NOT solved
Herve Pages
hpages at fhcrc.org
Fri Sep 21 00:20:52 CEST 2007
Hi Jenny,
Jenny Drnevich wrote:
> Hi Herve,
>
> Thanks for letting me know the new Windows binaries were available.
> However, I traced the problem to the new annotation package for
> ath1121501 - it is USELESS for GOstats testing!! Arabidopsis doesn't use
> EntrezIDs, but instead uses AGI locus identifier, given in the
> ath1121501ACCNUM environment. This environment used to give unique IDs
> for almost all probe sets, but now all of the listings are either
> "multiple" or NA:
>
>> probes <- ls(ath1121501ACCNUM)
>> probes[1:10]
> [1] "244901_at" "244902_at" "244903_at" "244904_at" "244905_at"
> [6] "244906_at" "244907_at" "244908_at" "244909_at" "244910_s_at"
>> length(probes)
> [1] 22810
>> locusList <- unique(unlist(mget(probes, ath1121501ACCNUM)))
>> length(locusList)
> [1] 2
>> locusList
> [1] "multiple" NA
>
>
> I realize this is probably came from the Arabidopsis database used to
> create the annotation package, but is there any way to "fix" this, or to
> put up the older annotation package? The old one had a few hundred
> "multiple" entries, but I figured it wouldn't matter that much to throw
> them out. Any other ideas for ways around this problem?
Thanks for reporting this! The ath1121501ACCNUM map is clearly broken and I will
investigate this. Note that there is a simple work around (for now). Here it
is.
The ath1121501ACCNUM map is in fact derived from the ath1121501MULTIHIT:
this one is the original mapping between probes and AGI locus identifiers.
But given that this mapping is many-to-many (hence the name "MULTIHIT")
some time ago someone has decided to also provide the ath1121501ACCNUM map
that would be the same mapping except for the probes that hit nultiple
AGI locus ids: in ath1121501ACCNUM those probes would be mapped to the
string "multiple".
I've checked the ath1121501MULTIHIT map and it looks OK:
> tt <- unlist(eapply(ath1121501MULTIHIT, length))
> table(tt)
tt
1 2 3 4 5 6 7 8 9 10 12 14 16
21731 945 62 32 14 10 4 4 1 1 1 3 1
17
1
> tt[tt==17]
266769_s_at
17
> ath1121501MULTIHIT[["266769_s_at"]]
[1] "AT1G12935" "AT1G18930" "AT2G03080" "AT2G04130" "AT2G04140" "AT3G29440"
[7] "AT3G42883" "AT3G44390" "AT3G47165" "AT3G50490" "AT4G12426" "AT4G17820"
[13] "AT4G20180" "AT5G26590" "AT5G28466" "AT5G33315" "AT5G56605"
You can compute the ath1121501ACCNUM yourself with:
> ath1121501ACCNUM <- eapply(ath1121501MULTIHIT,
function(ids) if (length(ids) > 1) "multiple" else ids)
Let me know if you find any other problem.
Thanks!
H.
>
> Thanks,
> Jenny
>
>> sessionInfo()
> R version 2.6.0 Under development (unstable) (2007-08-28 r42679)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] limma_2.11.13 ath1121501_1.99.10 GOstats_2.3.17
> [4] Category_2.3.39 genefilter_1.15.11 survival_2.32
> [7] RBGL_1.13.6 annotate_1.15.11 xtable_1.5-1
> [10] GO.db_1.99.3 AnnotationDbi_0.99.1 RSQLite_0.6-2
> [13] DBI_0.2-3 graph_1.15.20 Biobase_1.15.34
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.7 simpleaffy_2.13.01
>
>
>
>
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list