[BioC] topTable (fit) annotation

Wed Aug 31 19:24:46 CEST 2011

THANK YOU Sean. It is so much easier!!

Jing

-----Original Message-----
From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean Davis
Sent: Wednesday, August 31, 2011 10:09 AM
To: Jing Huang
Cc: bioconductor at r-project.org
Subject: Re: [BioC] topTable (fit) annotation

On Wed, Aug 31, 2011 at 12:21 PM, Jing Huang <huangji at ohsu.edu> wrote:
> Thank YOU Sean for responding my question. I am not sure where I should
> add the platform annotation in.
>
> Here are what I observed:
>
> If I extract "GDS2162" by typing in
>
>> gds=getGEO("GDS2162")
>> eset=GDS2eSet(gds,do.log2=T)
>
>>eset
>
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 45101 features, 16 samples
>  element names: exprs
> protocolData: none
> phenoData
>  sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total)
>  varLabels: sample genotype/variation agent description
>  varMetadata: labelDescription
> featureData
>  featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total)
>  fvarLabels: ID Gene.title ... GO.Component.1 (21 total)
>
>  fvarMetadata: Column labelDescription
> experimentData: use 'experimentData(object)'
>  pubMedIds: 16237459
> Annotation:
>
> Most of annotation is in.
>
>
>
> If I extract "GSE16962" by typing in
>
>>gse=getGEO("GSE16962")
>
>> gse
> $GSE16962_series_matrix.txt.gz
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 54675 features, 12 samples
>  element names: exprs
> protocolData: none
> phenoData
>  sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total)
>  varLabels: title geo_accession ... data_row_count (34 total)
>  varMetadata: labelDescription
> featureData
>  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
>  fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total)
>  fvarMetadata: Column Description labelDescription
> experimentData: use 'experimentData(object)'
> Annotation: GPL570

Hi, Jing.

gse above is a list.  You can use gse[[1]] to get back an
ExpressionSet, in this case the ExpressionSet that was generated using
GPL570 platform.  There is no need to do any of the stuff you note
below in the VAST majority of cases.

Sean

> It looks to me it includes annotation package such as GO term....
> But I can't use this eset data to do further analysis (such as fit table)
> what I need.
>
> If I extract ("GSE16962") by typing in
>
>>gse=getGEO("GSE16962", GSEMatrix=F)
>
> Then following GEOquery package, I can generate eset2, which I can use to
> do analysis what I need. But the eset2 looks like this:
>
>> eset2
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 54675 features, 12 samples
>  element names: exprs
> protocolData: none
> phenoData
>  sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total)
>  varLabels: samples
>  varMetadata: labelDescription
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation:
>
>
>
> Here are the scripts that I use to generate eset2:
>
>>probesets <- Table(GPLList(gse)[[1]])$ID
>> data.matrix <- do.call("cbind", lapply(GSMList(gse), function(x) {
> + tab <- Table(x)
> + mymatch <- match(probesets,tab$ID_REF)
> + return(tab$VALUE[mymatch])
> + }))
>> data.matrix <- apply(data.matrix, 2, function(x) {
> + as.numeric(as.character(x))
> + })
>> require(Biobase)
>> rownames(data.matrix) <- probesets
>> colnames(data.matrix) <- names(GSMList(gse))
>> pdata <- data.frame(samples=names(GSMList(gse)))
>> rownames(pdata) <- names(GSMList(gse))
>> pheno <- as(pdata,"AnnotatedDataFrame")
>> eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno)
>
>
> At which step, I should add
>
>>gplannot = getGEO("GPL96", AnnotGPL=TRUE)
>
>
> Many Many thanks
>
> Jing
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:
>
>>Hi, Jing.
>>
>>NCBI GEO maintains two types of GPL records.  The normal variant is
>>just supplied by the submitter.  However, when a GEO Series is curated
>>by NCBI GEO into a GEO DataSet (GDS), they create a so-called
>>"Annotation GPL".  These have a relatively standard set of columns.  I
>>have not made the change to GEOquery yet to grab this annotation GPL
>>when getting Series Matrix files.  But, you can get them yourself by
>>specifying:
>>
>>gplannot = getGEO("GPL96", AnnotGPL=TRUE)
>>
>>You can always replace the feature data of the ExpressionSets with the
>>information in the retrieved Annotation GPL.
>>
>>I hope that is clear.
>>
>>Sean
>>
>>
>>On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at ohsu.edu> wrote:
>>> Dear All members,
>>>
>>> I have been extracting data from GEO (GEO package) and do some analysis
>>>on them by using limma package. What I discover is the components of
>>>topTable(fit) are different from the dataset GDS and GSE.
>>>
>>> If the data is from GDS, then the colnames of topTable (fit) looks like
>>>this.
>>>
>>>> colnames(topTable(fit))
>>> [1] "ID"                    "Gene.title"            "Gene.symbol"
>>>  [4] "Gene.ID"               "UniGene.title"         "UniGene.symbol"
>>>  [7] "UniGene.ID"            "Nucleotide.Title"      "GI"
>>> [10] "GenBank.Accession"     "Platform_CLONEID"      "Platform_ORF"
>>> [13] "Platform_SPOTID"       "Chromosome.location"
>>>"Chromosome.annotation"
>>> [16] "GO.Function"           "GO.Process"            "GO.Component"
>>> [19] "GO.Function.1"         "GO.Process.1"          "GO.Component.1"
>>> [22] "CTRL"                  "HIF1a"                 "HIF2a"
>>> [25] "HIF1a2a"               "AveExpr"               "F"
>>> [28] "P.Value"               "adj.P.Val"
>>>
>>> If the data is from GSE, then the   colnames of topTable(fit) looks
>>>like this:
>>>
>>>>colnames(topTable(fit)
>>>
>>> [1] "ID"        "mir210"    "CTRL2"     "AveExpr"   "F"
>>>"P.Value"   "adj.P.Val"
>>>
>>> I am trying to add some term into this table by doing following one by
>>>one: the data is generated by Affymetrix human U133 platform:
>>>
>>>>Library(hgu133plus2.db)
>>>>x=hgu133plus2SYMBOL
>>>>y=topTable(fit)
>>>>y$SYMBOL=unlist(as.list(x[y$ID]))
>>>
>>> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO
>>>annotations as well.  I like to have the topTable more like the
>>>topTable(fit) generated at top by data GEO GDS data
>>>
>>> I am wondering if there is an easy way to annotate all once.
>>>
>>> In addition, I am having a trouble to annotate GO term.
>>>
>>> Many Thanks
>>>
>>> Jing
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>
>