[BioC] topTable (fit) annotation

Wed Aug 31 18:50:56 CEST 2011

Sorry. My problem is the colnames(topTable(fit)) that I generated by GEO
GSE dataset only provides a few items such as pValue...

I would like to have a full table just like GEO GDS dataset generated. I
describe the problem at my original email: Different dataset of GEO
provides different patten on topTable(fit).

This generated with GDS dataset

>colnames(topTable(fit))
[1] "ID"                    "Gene.title"            "Gene.symbol"
  [4] "Gene.ID"               "UniGene.title"         "UniGene.symbol"
  [7] "UniGene.ID"            "Nucleotide.Title"      "GI"
[10] "GenBank.Accession"     "Platform_CLONEID"      "Platform_ORF"
[13] "Platform_SPOTID"       "Chromosome.location"
"Chromosome.annotation"
[16] "GO.Function"           "GO.Process"            "GO.Component"
[19] "GO.Function.1"         "GO.Process.1"          "GO.Component.1"
[22] "CTRL"                  "HIF1a"                 "HIF2a"
[25] "HIF1a2a"               "AveExpr"               "F"
[28] "P.Value"               "adj.P.Val"

This generated with GSE dataset

>colnames(topTable(fit)

[1] "ID"        "mir210"    "CTRL2"     "AveExpr"   "F"
"P.Value"   "adj.P.Val"

I hope this is clear.

Jing

On 8/31/11 9:38 AM, "Freudenberg, Johannes (NIH/NIEHS) [E]"
<johannes.freudenberg at nih.gov> wrote:

>Hi Jing, 
>
>I'm not quite sure I understand your question.  Why can't you use the
>data for further analysis?  Maybe the issue is that getGEO() returns a
>list? Have you tried something like:
>
>> gse16962 <- getGEO("GSE16962")
>> eset <- gse16962[[1]]
>> head(fData(eset))
>                 ID GB_ACC SPOT_ID Species.Scientific.Name Annotation.Date
>1007_s_at 1007_s_at U48705    <NA>            Homo sapiens    Mar 11, 2009
>1053_at     1053_at M87338    <NA>            Homo sapiens    Mar 11, 2009
>117_at       117_at X51757    <NA>            Homo sapiens    Mar 11, 2009
>121_at       121_at X69699    <NA>            Homo sapiens    Mar 11, 2009
>1255_g_at 1255_g_at L36861    <NA>            Homo sapiens    Mar 11, 2009
>1294_at     1294_at L13852    <NA>            Homo sapiens    Mar 11, 2009
>              Sequence.Type                 Sequence.Source
>1007_s_at Exemplar sequence Affymetrix Proprietary Database
>1053_at   Exemplar sequence                         GenBank
>117_at    Exemplar sequence Affymetrix Proprietary Database
>121_at    Exemplar sequence                         GenBank
>1255_g_at Exemplar sequence Affymetrix Proprietary Database
>1294_at   Exemplar sequence                         GenBank
>
>Etc.
>
>--Johannes
>
>
>
>-----Original Message-----
>From: Jing Huang [mailto:huangji at ohsu.edu]
>Sent: Wednesday, August 31, 2011 12:21 PM
>To: Davis, Sean (NIH/NCI) [E]
>Cc: bioconductor at r-project.org
>Subject: Re: [BioC] topTable (fit) annotation
>
>Thank YOU Sean for responding my question. I am not sure where I should
>add the platform annotation in.
>
>Here are what I observed:
>
>If I extract "GDS2162" by typing in
>
>> gds=getGEO("GDS2162")
>> eset=GDS2eSet(gds,do.log2=T)
>
>>eset
>
>ExpressionSet (storageMode: lockedEnvironment)
>assayData: 45101 features, 16 samples
>  element names: exprs
>protocolData: none
>phenoData
>  sampleNames: GSM67339 GSM67343 ... GSM67352 (16 total)
>  varLabels: sample genotype/variation agent description
>  varMetadata: labelDescription
>featureData
>  featureNames: 1415670_at 1415671_at ... AFFX-TrpnX-M_at (45101 total)
>  fvarLabels: ID Gene.title ... GO.Component.1 (21 total)
>      
>  fvarMetadata: Column labelDescription
>experimentData: use 'experimentData(object)'
>  pubMedIds: 16237459
>Annotation:  
>
>Most of annotation is in.
>
>
>
>If I extract "GSE16962" by typing in
>
>>gse=getGEO("GSE16962")
>
>> gse
>$GSE16962_series_matrix.txt.gz
>ExpressionSet (storageMode: lockedEnvironment)
>assayData: 54675 features, 12 samples
>  element names: exprs
>protocolData: none
>phenoData
>  sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total)
>  varLabels: title geo_accession ... data_row_count (34 total)
>  varMetadata: labelDescription
>featureData
>  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
>  fvarLabels: ID GB_ACC ... Gene.Ontology.Molecular.Function (16 total)
>  fvarMetadata: Column Description labelDescription
>experimentData: use 'experimentData(object)'
>Annotation: GPL570
>
>It looks to me it includes annotation package such as GO term....
>But I can't use this eset data to do further analysis (such as fit table)
>what I need.
>
>If I extract ("GSE16962") by typing in
>
>>gse=getGEO("GSE16962", GSEMatrix=F)
> 
>Then following GEOquery package, I can generate eset2, which I can use to
>do analysis what I need. But the eset2 looks like this:
>
>> eset2
>ExpressionSet (storageMode: lockedEnvironment)
>assayData: 54675 features, 12 samples
>  element names: exprs
>protocolData: none
>phenoData
>  sampleNames: GSM424759 GSM424760 ... GSM424770 (12 total)
>  varLabels: samples
>  varMetadata: labelDescription
>featureData: none
>experimentData: use 'experimentData(object)'
>Annotation:  
>
>
>
>Here are the scripts that I use to generate eset2:
>
>>probesets <- Table(GPLList(gse)[[1]])$ID  data.matrix <-
>>do.call("cbind", lapply(GSMList(gse), function(x) {
>+ tab <- Table(x)
>+ mymatch <- match(probesets,tab$ID_REF)
>+ return(tab$VALUE[mymatch])
>+ }))
>> data.matrix <- apply(data.matrix, 2, function(x) {
>+ as.numeric(as.character(x))
>+ })
>> require(Biobase)
>> rownames(data.matrix) <- probesets
>> colnames(data.matrix) <- names(GSMList(gse)) pdata <-
>> data.frame(samples=names(GSMList(gse)))
>> rownames(pdata) <- names(GSMList(gse)) pheno <-
>> as(pdata,"AnnotatedDataFrame")
>> eset2 <- new('ExpressionSet',exprs=data.matrix,phenoData=pheno)
>
>
>At which step, I should add
>
>>gplannot = getGEO("GPL96", AnnotGPL=TRUE)
>
>
>Many Many thanks
>
>Jing
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>On 8/30/11 6:01 PM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:
>
>>Hi, Jing.
>>
>>NCBI GEO maintains two types of GPL records.  The normal variant is
>>just supplied by the submitter.  However, when a GEO Series is curated
>>by NCBI GEO into a GEO DataSet (GDS), they create a so-called
>>"Annotation GPL".  These have a relatively standard set of columns.  I
>>have not made the change to GEOquery yet to grab this annotation GPL
>>when getting Series Matrix files.  But, you can get them yourself by
>>specifying:
>>
>>gplannot = getGEO("GPL96", AnnotGPL=TRUE)
>>
>>You can always replace the feature data of the ExpressionSets with the
>>information in the retrieved Annotation GPL.
>>
>>I hope that is clear.
>>
>>Sean
>>
>>
>>On Tue, Aug 30, 2011 at 5:01 PM, Jing Huang <huangji at ohsu.edu> wrote:
>>> Dear All members,
>>>
>>> I have been extracting data from GEO (GEO package) and do some
>>>analysis on them by using limma package. What I discover is the
>>>components of
>>>topTable(fit) are different from the dataset GDS and GSE.
>>>
>>> If the data is from GDS, then the colnames of topTable (fit) looks
>>>like this.
>>>
>>>> colnames(topTable(fit))
>>> [1] "ID"                    "Gene.title"            "Gene.symbol"
>>>  [4] "Gene.ID"               "UniGene.title"         "UniGene.symbol"
>>>  [7] "UniGene.ID"            "Nucleotide.Title"      "GI"
>>> [10] "GenBank.Accession"     "Platform_CLONEID"      "Platform_ORF"
>>> [13] "Platform_SPOTID"       "Chromosome.location"
>>>"Chromosome.annotation"
>>> [16] "GO.Function"           "GO.Process"            "GO.Component"
>>> [19] "GO.Function.1"         "GO.Process.1"          "GO.Component.1"
>>> [22] "CTRL"                  "HIF1a"                 "HIF2a"
>>> [25] "HIF1a2a"               "AveExpr"               "F"
>>> [28] "P.Value"               "adj.P.Val"
>>>
>>> If the data is from GSE, then the   colnames of topTable(fit) looks
>>>like this:
>>>
>>>>colnames(topTable(fit)
>>>
>>> [1] "ID"        "mir210"    "CTRL2"     "AveExpr"   "F"
>>>"P.Value"   "adj.P.Val"
>>>
>>> I am trying to add some term into this table by doing following one
>>>by
>>>one: the data is generated by Affymetrix human U133 platform:
>>>
>>>>Library(hgu133plus2.db)
>>>>x=hgu133plus2SYMBOL
>>>>y=topTable(fit)
>>>>y$SYMBOL=unlist(as.list(x[y$ID]))
>>>
>>> It works but I need to add ENTREZID,SYMBOL,CHR, CHRloc, and GO
>>>annotations as well.  I like to have the topTable more like the
>>>topTable(fit) generated at top by data GEO GDS data
>>>
>>> I am wondering if there is an easy way to annotate all once.
>>>
>>> In addition, I am having a trouble to annotate GO term.
>>>
>>> Many Thanks
>>>
>>> Jing
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor