[BioC] Annotation for a GEO data set
Sean Davis
seandavi at gmail.com
Mon Mar 22 17:39:31 CET 2010
On Thu, Mar 4, 2010 at 12:30 PM, Joern Grame <gjormac at googlemail.com> wrote:
> Dear Bioconductor,
>
> I have a question concerning a GEO data set. I have downloaded the data into
> R using the GEOquery package. I'm trying to map the Affymetrix probe ids
> onto gene symbols, but can't find the appropriate annotation data. Following
> some of the tutorials, using the annotate package should help, but what I
> get from the function annotation is the GEO platform identifier:
>
>> library(GEOquery)
>> library(annotation)
>> data <- GEOquery(GEO='GSE13639')
>> annotation(data)
> [1] "GPL570"
>
> I'd like to use functions like getSYMBOL, but I don't know which mapping
> package to install. Help will be much appreciated.
Hi, Joern. GPL570 is represented in Bioconductor as hgu133plus2.db.
You can get this the old-fashioned way by looking up GPL570 in GEO and
then going to the Bioconductor website to find the right package by
hand. Alternatively, you may use the GEOmetadb package to get the
information directly:
> library(GEOmetadb)
> sqlfile = getSQLiteFile()
> con = dbConnect("SQLite",sqlfile)
> dbGetQuery(con,"select gpl,title,bioc_package from gpl where gpl='GPL570'")
Then, you are off to the races....
> biocLite('hgu133plus2.db')
will get you the correct package.
However, your "data" object already has the annotation information
from NCBI GEO in it:
> colnames(fData(data))
[1] "ID" "GB_ACC"
[3] "SPOT_ID" "Species.Scientific.Name"
[5] "Annotation.Date" "Sequence.Type"
[7] "Sequence.Source" "Target.Description"
[9] "Representative.Public.ID" "Gene.Title"
[11] "Gene.Symbol" "ENTREZ_GENE_ID"
[13] "RefSeq.Transcript.ID" "Gene.Ontology.Biological.Process"
[15] "Gene.Ontology.Cellular.Component" "Gene.Ontology.Molecular.Function"
> fData(data)$Gene.Symbol[1:10]
[1] DDR1 RFC2 HSPA6 PAX8 GUCA1A UBA7 THRA PTPN21 CCL5 CYP2E1
20828 Levels: ADAM32 AFG3L1 ALG10 ARMCX4 ATP6V1E2 BEST4 C15orf40 ... FAM86B1
> fData(data)["1007_s_at",]$Gene.Symbol
[1] DDR1
20828 Levels: ADAM32 AFG3L1 ALG10 ARMCX4 ATP6V1E2 BEST4 C15orf40 ... FAM86B1
Hope that helps.
Sean
More information about the Bioconductor
mailing list