[BioC] HuGene annotation and htmls
Marc Carlson
mcarlson at fhcrc.org
Thu Apr 16 18:49:43 CEST 2009
Hi guys,
So something confusing has happened with the Hugene, Mogene and Ragene
platforms. With revision 4 of these platforms, Affymetrix has decided
to stop releasing the transcript.csv file which identifies the
relationships between the transcript cluster IDs and the genes
represented on the platform, and has switched to releasing a
probeset.csv file instead which relates probeset IDs to genes. This
results in a massive expansion in the number of the identifiers used for
this platform. So in order to have things make more sense we have now
forked this package in the development branch so that there is now a
"transcriptcluster" package (based on version 3) version and a
"probeset" (version 4) version based on how you happen to need the probe
identifiers to be arranged. And of course, if you need yet another
mapping, you can always make a new package as needed using the SQLForge
code in AnnotationDbi.
cstrato wrote:
> Dear Mayte
> Everything is fine with your code, nothing to worry about.
> If you look at column "gene_assignment" of
> "HuGene-1_0-st-v1.na28.hg18.transcript.csv" you will see many NAs, e.g.:
> > getSYMBOL("7896740", "hugene10st")
> 7896740
> "OR4F17"
> > getSYMBOL("7896746", "hugene10st")
> 7896746
> NA
> Best regards
> Christian
> Mayte Suarez-Farinas wrote:
>> You are right James!!!
>> with the keys James sent the package hugene10st work just fine.
>> so it looks like the "error" come from my use of xps.
>> here is my code:
>> library(xps)
>> ### define directories:
>> # directory containing Affymetrix library files
>> libdir <- "/Users/Mayte/Rlibrary/AffyDB/libraryfiles"
>> anndir <- "/Users/Mayte/Rlibrary/AffyDB/Annotation"
>> scmdir <- "/Users/Mayte/Rlibrary/AffyDB/ROOTSchemes"
>> scheme.hugene10stv1r4.na28 <- import.exon.scheme
>> ("Scheme_HuGene10stv1r4_na28",filedir=scmdir,
>> layoutfile=paste(libdir,"HuGene-1_0-st- v1.r4.clf",sep="/"),
>> schemefile=paste(libdir,"HuGene-1_0-st- v1.r4.pgf",sep="/"),
>> probeset=paste(anndir,"HuGene-1_0-st-
>> v1.na28.hg18.probeset.csv",sep="/"),
>> transcript=paste(anndir,"HuGene-1_0-st-
>> v1.na28.hg18.transcript.csv",sep="/"))
>> scheme.hugene10stv1r4 <- root.scheme(paste(scmdir,
>> "Scheme_HuGene10stv1r4_na28.root",sep = "/"))
>> G1ST_data<-import.data(scheme.hugene10stv1r4, "Pamela_G1ST_dataxps",
>> celdir=getwd(), celfiles = as.character(PD[1:8,'CELfile']), verbose
>> = FALSE)
>> G1ST_rma_xps <- rma(G1ST_data, "Pamela_G1ST_rma_t",
>> background="antigenomic", option="transcript",
>> exonlevel="core+affx", normalize=T)
>> The "featureNames" of the data (or keys) can be taken as:
>> keys<-as.character(exprs(G1ST_rma_xps)$UnitName)
>> but almost half them do not have symbol:
>> sum(!is.na(getSYMBOL(keys, "hugene10st")))
>> [1] 19899
>> sum(is.na(getSYMBOL(keys, "hugene10st")))
>> 9027
>> Is this OK ? or is there any mistake in my code??
>> Thanks in advance for everybody help!!!
>> and sorry for bothering so many times!
>> Mayte
>> On Apr 10, 2009, at 10:55 AM, James W. MacDonald wrote:
>>> I wonder if this is a problem with how the package was built. The
>>> numbers that Marc supplied are the Exon Probeset IDs, but the Lkeys
>>> of the hugene10st.db package seem to be what Affy calls the
>>> Transcript Cluster ID.
>>>> keys <- c("7903188","7903203")
>>>> getSYMBOL(keys, "hugene10st")
>>> 7903188 7903203
>>> "PTBP2" "SNX7"
>>> Best,
>>> Jim
>>> Mayte Suarez-Farinas wrote:
>>>> I meant that the usual functions from annotate does not work.
>>>> When I ran your code, I get:
>>>> library("annotate")
>>>> > library("hugene10st.db")
>>>> > keys = c("7903193","7903204")
>>>> >
>>>> > getSYMBOL(keys, "hugene10st")
>>>> 7903193 7903204
>>>> NA NA
>>>> >
>>>> > lookUp(keys, "hugene10st" , "CHR")
>>>> $`7903193`
>>>> [1] NA
>>>> $`7903204`
>>>> [1] NA
>>>> > lookUp(keys, "hugene10st" , "ENTREZID")
>>>> $`7903193`
>>>> [1] NA
>>>> $`7903204`
>>>> [1] NA
>>>> sessionInfo()
>>>> R version 2.8.1 (2008-12-22)
>>>> i386-apple-darwin8.11.1
>>>> locale:
>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>> attached base packages:
>>>> [1] splines tools stats graphics grDevices utils
>>>> datasets methods base
>>>> other attached packages:
>>>> [1] hugene10st.db_1.0.2 statmod_1.3.8
>>>> beadarray_1.10.0 sma_0.5.15 hwriter_1.0
>>>> [6] affycoretools_1.14.1 annaffy_1.14.0
>>>> KEGG.db_2.2.5 biomaRt_1.16.0 GOstats_2.8.0
>>>> [11] Category_2.8.4 RBGL_1.18.0
>>>> GO.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4
>>>> [16] graph_1.20.0 limma_2.16.4
>>>> affyQCReport_1.20.0 geneplotter_1.20.0 annotate_1.20.1
>>>> [21] AnnotationDbi_1.5.18 lattice_0.17-17
>>>> RColorBrewer_1.0-2 affyPLM_1.18.1 preprocessCore_1.4.0
>>>> [26] xtable_1.5-4 simpleaffy_2.18.0
>>>> gcrma_2.14.1 matchprobes_1.14.1 genefilter_1.22.0
>>>> [31] survival_2.34-1 affy_1.20.2 Biobase_2.2.2
>>>> loaded via a namespace (and not attached):
>>>> [1] GSEABase_1.4.0 KernSmooth_2.22-22 RCurl_0.94-1
>>>> XML_2.1-0 affyio_1.10.1
>>>> [6] cluster_1.11.11 grid_2.8.1 xps_1.2.8
>>>> On Apr 9, 2009, at 5:26 PM, Marc Carlson wrote:
>>>>> Hi Mayte,
>>>>> I can't tell from your post what you tried to do, or even what
>>>>> exactly
>>>>> you need to know. Please give us the code you were trying to
>>>>> use, along
>>>>> with an example that didn't behave the way you expected it to and
>>>>> you
>>>>> the results of calling sessionInfo() after you did that. You can
>>>>> find
>>>>> other helpful tips on the posting guide:
>>>>> http://www.bioconductor.org/docs/postingGuide.html
>>>>> What little I can discern from your post I will try to answer.
>>>>> To use
>>>>> getSYMBOL() or lookUp(), you need to 1st of all make sure that
>>>>> you have
>>>>> loaded the annotate package. Then you need to call it
>>>>> correctly. Here
>>>>> is an example that I did using the very latest version of the
>>>>> hugene10st.db package.
>>>>> library("annotate")
>>>>> library("hugene10st.db")
>>>>> keys = c("7903193","7903204")
>>>>> getSYMBOL(keys, "hugene10st")
>>>>> lookUp(keys, "hugene10st" , "CHR")
>>>>> lookUp(keys, "hugene10st" , "ENTREZID")
>>>>> Hope this helps,
>>>>> Marc
>>>>> Mayte Suarez-Farinas wrote:
>>>>>> I am learning to work with the HuGene ST1 chips.
>>>>>> I was able to use xps to read and preprocess the files
>>>>>> and then I convert to ExpressionSet class to use limma
>>>>>> for modelling.
>>>>>> Next step I stop: the annotation.
>>>>>> I load library("hugene10st.db") but the normal functions
>>>>>> to create html annotation does not seems to work on this chip.
>>>>>> I also try to get each component using getSYMBOL and lookUP
>>>>>> with no success.
>>>>>> what's the way to go???
>>>>>> Thanks
>>>>>> Mayte
>>>>>> [[alternative HTML version deleted]]
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives: http://news.gmane.org/
>>>>>> gmane.science.biology.informatics.conductor
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/
>>>> gmane.science.biology.informatics.conductor
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Douglas Lab
>>> University of Michigan
>>> Department of Human Genetics
>>> 5912 Buhl
>>> 1241 E. Catherine St.
>>> Ann Arbor MI 48109-5618
>>> 734-615-7826
>> [[alternative HTML version deleted]]
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list