[BioC] annotation package for chicken affyprobes
Lina Hultin-Rosenberg
Lina.Hultin.Rosenberg at ebc.uu.se
Mon Aug 28 18:25:22 CEST 2006
Hi again!
I managed to build the annotation package for chicken - thanks for all your
help!
I was a bit surprised though by the low annotation coverage, see QC data
below. I don't really know how the data is collected but I would think more
information on chromosome location is known for the probesets. When reading
about the new chicken genome assembly (http://genome.ucsc.edu) it says that
around 95% of the sequence has been anchored to chromosomes. I thought the
annotation process in R used this information?
What can be the reason for the very few anchored probesets? I might be doing
something wrong or perhaps it is a problem of mapping probe id's to other
identifiers? I used the genbank mappings as mybasefile and unigene and
entrez mappings as other sources. Is there a way within R to increase
annotation coverage? I am especially interested in chromosome location
(number), but maybe this is a problem that is best solved outside R?
Would greatly appreciate some help!
Thank you,
Lina
=======================================================================
QC data:
Number of probes: 38535
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
chickenACCNUM found 25654 of 38535
chickenCHR found 9707 of 38535
chickenCHRLOC found 156 of 38535
chickenENZYME found 52 of 38535
chickenGENENAME found 0 of 38535
chickenGO found 4224 of 38535
chickenLOCUSID found 9722 of 38535
chickenMAP found 0 of 38535
chickenPATH found 87 of 38535
chickenPMID found 283 of 38535
chickenREFSEQ found 9709 of 38535
chickenSUMFUNC found 0 of 38535
chickenSYMBOL found 9722 of 38535
chickenUNIGENE found 289 of 38535
Mappings found for non-probe based rda files:
chickenENZYME2PROBE found 33
chickenGO2ALLPROBES found 1785
chickenGO2PROBE found 930
chickenORGANISM found 1
chickenPATH2PROBE found 31
chickenPFAM found 7418
chickenPMID2PROBE found 101
chickenPROSITE found 5490
==========================================================================
-----Ursprungligt meddelande-----
Från: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] För Nianhua Li
Skickat: den 17 augusti 2006 03:15
Till: bioconductor at stat.math.ethz.ch
Ämne: Re: [BioC] annotation package for chicken affyprobes
Hi, Lina,
We are lucky in case of chicken. I just updated AnnBuilder (v1.11.7) to
support Gallus gallus (taxon id 9031). You can get it from svn right
away or wait 2 days to download it from bioc website. Or if you want to
try it now, here is the changes:
================================================================
--- IPI.R (new)
+++ IPI.R (old)
@@ -21,8 +21,7 @@
speciesNorganismTable <- rbind(
c("human", "Homo sapiens"),
c("mouse", "Mus musculus"),
- c("rat", "Rattus norvegicus"),
- c("chick", "Gallus gallus")
+ c("rat", "Rattus norvegicus")
)
colnames(speciesNorganismTable) <- c("species", "organism")
return(speciesNorganismTable)
=================================================================
The above change happens in function "speciesNorganism". This allows you
get annotation for PFAM and PROSITE.
=================================================================
--- getSrcUrl.R (new)
+++ getSrcUrl.R (old)
@@ -65,7 +65,6 @@
"DANIO RERIO" = "Danio_Rerio",
"CAENORHABDITIS ELEGANS" = "Caenorhabditis_elegans",
"DROSOPHILA MELANOGASTER" = "Drosophila_melanogaster",
- "GALLUS GALLUS" = "Gallus_gallus",
NA)
if(is.na(key)) {
warning(paste("Organism", organism, "is not supported by
GoldenPath (GP)."))
@@ -170,7 +169,6 @@
Sma = "Schistosoma mansoni", Ssa = "Salmo salar",
Ssc = "Sus scrofa", Str = "Xenopus tropicalis",
Xl = "Xenopus laevis", At = "Arabidopsis thaliana",
- Gga = "Gallus gallus",
Gma = "Glycine max", Han = "Helianthus annus",
Hv = "Hordeum vulgare", Lsa = " Lactuca sativa",
Les = "Lycopersicon esculentum", Lco = "Lotus corniculatus",
==================================================================
The first change is in function "getUCSCUrl", for chromosome location.
The second is in function "UGSciNames" for UniGene.
I test it with this script:
===============================
library(AnnBuilder)
mypkg <- function(pkgPath, version) {
ABPkgBuilder(baseName="mybase.txt",
baseMapType="ll",
pkgName="mypkg",
pkgPath=pkgPath,
organism="Gallus gallus",
version=version,
author=list(
authors="Nianhua Li",
maintainer="Nianhua Li<email at email.org>"
)
)
}
mypkg(getwd(), "1.0.0")
===============================
mybase.txt is
1 395929
2 395844
3 396017
4 415357
5 424377
QC data is:
Number of probes: 5
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
mypkgACCNUM found 0 of 5
mypkgCHRLOC found 4 of 5
mypkgCHR found 5 of 5
mypkgENZYME found 4 of 5
mypkgGENENAME found 0 of 5
mypkgGO found 5 of 5
mypkgLOCUSID found 5 of 5
mypkgMAP found 0 of 5
mypkgPATH found 5 of 5
mypkgPMID found 3 of 5
mypkgREFSEQ found 5 of 5
mypkgSUMFUNC found 0 of 5
mypkgSYMBOL found 5 of 5
mypkgUNIGENE found 5 of 5
Mappings found for non-probe based rda files:
mypkgENZYME2PROBE found 5
mypkgGO2ALLPROBES found 107
mypkgGO2PROBE found 19
mypkgORGANISM found 1
mypkgPATH2PROBE found 18
mypkgPFAM found 5
mypkgPMID2PROBE found 4
mypkgPROSITE found 3
Let me know if you have any questions or concerns. Cheers!
nianhua
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list