[BioC] Annotations dealing with "removed" refseq record
Francois Pepin
fpepin at cs.mcgill.ca
Fri Jun 8 22:31:30 CEST 2007
Hi,
I think the annotation system has problems dealing with RefSeq that were
removed.
This is looking at the Erbb2 gene in mouse (entrezID=13866) on the whole
genome mouse chip from Agilent (annotation package: mgug4122a). From the
annotations provided by Agilent, there are 2 probes that map to it:
A_52_P49250 and A_51_P216179.
Currently, the annotations do not give any results for it:
> library(mgug4122a)
> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aSYMBOL))
A_52_P49250 A_51_P216179
NA NA
The accession number that is given indeed points to NM_010152.
> unlist(mget(c('A_52_P49250','A_51_P216179'),mgug4122aACCNUM))
A_52_P49250 A_51_P216179
"NM_010152" "NM_010152"
Looking at it on the NCBI website, it does point to Erbb2, but it also
says: "This record was removed by RefSeq staff".
Not being entirely familiar with the process, I would point to this as a
likely reason for the lack of annotations for those two probes.
I have not done an extensive check between the Agilent annotation and
the ones in mgug4122a to see how many other probes might be hit by this.
> sessionInfo()
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;
LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;
LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;
LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;
LC_IDENTIFICATION=C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets"
[6] "methods" "base"
other attached packages:
mgug4122a
"1.16.0"
If there is any more information I can provide, please tell
me.
Francois
More information about the Bioconductor
mailing list