[BioC] GEOquery package
Freudenberg, Johannes (NIH/NIEHS) [E]
johannes.freudenberg at nih.gov
Tue Aug 30 17:50:32 CEST 2011
Hi Jing,
The values you show certainly look like they are already on the log-scale. But just to be sure you can quickly check the GEO website:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM424764
About half way down the page it says something like:
"Data table header descriptions
ID_REF
VALUE log2 signal intensity, RMA"
So you probably don't want to log these again in this case ...
--Johannes
-----Original Message-----
From: Jing Huang [mailto:huangji at ohsu.edu]
Sent: Tuesday, August 30, 2011 11:36 AM
To: 'bioconductor at r-project.org'
Subject: [BioC] GEOquery package
Dear Sean and all members,
I am trying to extract GSE data from GEO and do analysis. I am wondering if the GSE data has been normalized and log 2 transformed. R scripts and output are copied below. Can somebody help me on this?
>Table(GSMList(gse)[[1]])[1:5, ]
ID_REF VALUE
1 1007_s_at 7.693888187
2 1053_at 8.571408272
3 117_at 5.179812431
4 121_at 7.468027592
5 1255_g_at 3.118550777
> Columns(GSMList(gse)[[1]])[1:5, ]
Column Description
1 ID_REF
2 VALUE log2 signal intensity, RMA <<<<< Does this means that the value is log2 transformed and the data was normalized by RMA
NA <NA> <NA>
NA.1 <NA> <NA>
NA.2 <NA> <NA>
According to GEOquery package I should do following steps in order to get the eset:
> probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <-
> do.call("cbind", lapply(GSMList(gse), function(x) {
+ tab <- Table(x)
+ mymatch <- match(probesets, tab$ID_REF)
+ return(tab$VALUE[mymatch])
+ }))
> data.matrix <- apply(data.matrix, 2, function(x) {
+ as.numeric(as.character(x))
+ })
> data.matrix <- log2(data.matrix)
> data.matrix[1:5, ]
GSM424759 GSM424760 GSM424761 GSM424762 GSM424763 GSM424764 GSM424765 [1,] 2.943713 2.917086 2.926155 2.983485 2.973219 2.962445 2.926030 [2,] 3.099532 3.136898 3.152696 3.217172 3.206948 3.198448 3.135146 [3,] 2.372900 2.309177 2.354380 2.373350 2.368464 2.381139 2.314555 [4,] 2.900727 2.873853 2.863911 2.879232 2.927384 2.913594 2.852870 [5,] 1.640876 1.645330 1.494274 1.792643 1.719597 1.648126 1.605055
Is the log2 transformation necessary for this dataset?
Many thanks
Jing
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list