[BioC] vsn and oligo (or xps?) packages for GeneST expression arrays

cstrato cstrato at aon.at
Mon Jan 4 20:29:04 CET 2010


Dear Tim,

Currently xps does not directly support vsn, however it is possible to 
combine xps with vsn, as the following code for the Affymetrix HuGene 
dataset demonstrates:

1, create ROOT scheme file and import HuGene CEL-files:
### new R session: load library xps
 > library(xps)

# directory containing Affymetrix library files
 > libdir <- "/Volumes/GigaDrive/Affy/libraryfiles"
# directory containing Affymetrix annotation files
 > anndir <- "/Volumes/GigaDrive/Affy/Annotation"
# directory to store ROOT scheme files
 > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes"

# create ROOT scheme file for whole genome array HuGene
 > scheme.genome <- 
import.exon.scheme("Scheme_HuGene10stv1r4_na29_hg18",filedir=scmdir, 
paste(libdir,"HuGene-1_0-st-v1.r4.analysis-lib-files/HuGene-1_0-st-v1.r4.clf",sep="/"), 
paste(libdir,"HuGene-1_0-st-v1.r4.analysis-lib-files/HuGene-1_0-st-v1.r4.pgf",sep="/"), 
paste(anndir,"Version09Jul/HuGene-1_0-st-v1.na29.hg18.probeset.csv",sep="/"), 
paste(anndir,"Version09Jul/HuGene-1_0-st-v1.na29.hg18.transcript.csv",sep="/"))

# directory containing Tissues CEL files
 > celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuGene"
# directory to store ROOT raw data files
 > datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData"

# import tissues from Affymetrix Exon Array Dataset for HuGene-1_0-st-v1
 > celfiles <- c("TisMap_Breast_01_v1_WTGene1.CEL", 
"TisMap_Breast_02_v1_WTGene1.CEL", "TisMap_Breast_03_v1_WTGene1.CEL", 
"TisMap_Prostate_01_v1_WTGene1.CEL", 
"TisMap_Prostate_02_v1_WTGene1.CEL", "TisMap_Prostate_03_v1_WTGene1.CEL")
# rename CEL files
 > celnames <- c("Breast01", "Breast02", "Breast03", "Prostate01", 
"Prostate02", "Prostate03")
# import CEL files
 > data.genome <- import.data(scheme.genome, "HuTissuesGenome6", 
filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames)

Note: It is suggested to rename the CEL-files and/or save copies of the 
original CEL-files, since importing the results from "vsn" requires 
conversion to CEL-files which may overwrite the original CEL-files!

2, convert raw data to expression levels using vsn and xps:
### new R session:
# need to load xps after vsn (to be able to use intensity<-)
 > library(vsn)
 > library(xps)

### first, load ROOT scheme file and ROOT data file
 > scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes"
 > scheme.genome <- 
root.scheme(paste(scmdir,"Scheme_HuGene10stv1r4_na29_hg18.root",sep="/"))
 > datdir <- "/Volumes/GigaDrive/CRAN/Workspaces/ROOTData"
 > data.genome <- root.data(scheme.genome, 
paste(datdir,"HuTissuesGenome6_cel.root",sep="/"))

# attach intensity to data
 > data.tmp <- attachInten(data.genome)
 > str(data.tmp)

# get intensities as data.frame (includes X,Y)
 > tmp <- intensity(data.tmp)
 > head(tmp)

# apply vsn2 to intensities only (w/o X,Y)
 > data.vsn2 <- vsn2(as.matrix(tmp[,3:ncol(tmp)]))

# convert result from log2
 > value <- as.data.frame(2^data.vsn2 at hx)

# attach (X,Y) coordinates
 > value <- cbind(tmp[,1:2], value)
 > head(value)

# replace data with value and save in ROOT file:
# note: this step creates modified CEL-files!
 > intensity(data.tmp, "tmp_DataVSN", TRUE) <- value
 > str(data.tmp)

# apply rma
 > datdir <- getwd()
 > data.norm <- rma(data.tmp, "tmp_data_norm", filedir=datdir, 
tmpdir="", background="none", normalize=FALSE,  exonlevel="core+affx")
 > str(data.norm)
 > boxplot(data.norm)

# get data.frame
 > expr.norm <- validData(data.norm)
 > head(expr.norm)

I hope this example shows you how to use xps with vsn. In the long term 
I would like to implement vsn in xps, however this requires converting 
the relevant code to C++ and it is not easy to find out which parts of 
package vsn I need to convert.

Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._



Tim Rayner wrote:
> Hi,
>
> I'm curious to know whether there are any plans to support the use of
> the VSN algorithm (i.e. the vsn package) with the classes defined by
> the oligo package. The reason I ask is that it seems as though the
> latter package is now touted as one of the supported methods of
> handling Affymetrix GeneST expression arrays (e.g. HuGene, MoGene),
> but as far as I can see the only expression
> normalisation/summarisation method currently supported for objects of
> the oligo GeneFeatureSet class seems to be RMA. I'm not even sure if
> GCRMA is supported for GeneFeatureSet objects? I guess the same
> questions also apply to the xps package, although I suspect that VSN
> support would turn out to be more work to implement there given the
> rather different underlying object structures used by xps. Do please
> correct me if I'm wrong about any of this, but until now I've been
> forced to shoehorn our data into the old AffyBatch workflow using
> custom CDFs and I'd like to know if there's a better (i.e. supported)
> workflow I could be using.
>
> Best regards,
>
> Tim Rayner
>
>
>
> # Current setup for reference:
>   
>> sessionInfo()
>>     
> R version 2.10.0 Patched (2009-11-03 r50305)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] oligo_1.10.0         preprocessCore_1.8.0 oligoClasses_1.8.0
> [4] gcrma_2.18.0         vsn_3.14.0           affy_1.24.2
> [7] Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.18.0 affyio_1.14.0     Biostrings_2.14.8 DBI_0.2-4
> [5] grid_2.10.0       IRanges_1.4.9     lattice_0.17-26   limma_3.2.1
> [9] splines_2.10.0
>
>
>
>



More information about the Bioconductor mailing list