[BioC] GEOquery package

Freudenberg, Johannes (NIH/NIEHS) [E] johannes.freudenberg at nih.gov
Tue Aug 30 17:50:32 CEST 2011


Hi Jing,

The values you show certainly look like they are already on the log-scale.  But just to be sure you can quickly check the GEO website:

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM424764

About half way down the page it says something like:

"Data table header descriptions 
ID_REF  
VALUE log2 signal intensity, RMA"

So you probably don't want to log these again in this case ...

--Johannes


-----Original Message-----
From: Jing Huang [mailto:huangji at ohsu.edu] 
Sent: Tuesday, August 30, 2011 11:36 AM
To: 'bioconductor at r-project.org'
Subject: [BioC] GEOquery package

Dear Sean and all members,

I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below.  Can somebody help me on this?

>Table(GSMList(gse)[[1]])[1:5, ]
     ID_REF       VALUE
1 1007_s_at 7.693888187
2   1053_at 8.571408272
3    117_at 5.179812431
4    121_at 7.468027592
5 1255_g_at 3.118550777
> Columns(GSMList(gse)[[1]])[1:5, ]
     Column                Description
1    ID_REF
2     VALUE log2 signal intensity, RMA       <<<<< Does this means that the value is log2 transformed and the data was         normalized by RMA
NA     <NA>                       <NA>
NA.1   <NA>                       <NA>
NA.2   <NA>                       <NA>

According to GEOquery package I should do following steps in order to get the eset:

> probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <- 
> do.call("cbind", lapply(GSMList(gse), function(x) {
+ tab <- Table(x)
+ mymatch <- match(probesets, tab$ID_REF)
+ return(tab$VALUE[mymatch])
+ }))
> data.matrix <- apply(data.matrix, 2, function(x) {
+ as.numeric(as.character(x))
+ })
> data.matrix <- log2(data.matrix)
> data.matrix[1:5, ]

     GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 [1,]  2.943713  2.917086  2.926155  2.983485  2.973219  2.962445  2.926030 [2,]  3.099532  3.136898  3.152696  3.217172  3.206948  3.198448  3.135146 [3,]  2.372900  2.309177  2.354380  2.373350  2.368464  2.381139  2.314555 [4,]  2.900727  2.873853  2.863911  2.879232  2.927384  2.913594  2.852870 [5,]  1.640876  1.645330  1.494274  1.792643  1.719597  1.648126  1.605055

Is the log2 transformation  necessary for this dataset?
Many thanks

Jing


	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list