[BioC] Cannot read in CEL files with XPS
cstrato
cstrato at aon.at
Sun Dec 7 20:53:29 CET 2008
Dear Chris,
Maybe the following information can help you solve your problems:
This is my setup:
A dual-boot MacBook Pro, 2GB RAM, running Windows XP SP2 where I have
installed the following binary versions:
- R-2.8.0-win32.exe
- root_v5.18.00.win32.vc80.msi
- xps_1.2.1.zip
Note that root_v5.18.00 is necessary since Bioconductor has compiled xps
with this version.
You can run xps either from RGui or from Rterm:
When using RGui you should set "verbose=FALSE" in all functions, since
you will not see any messages anyhow. I would recommend using Rterm with
"verbose=TRUE", at least initially to get a feeling what xps does, see
the examples below.
1. Import schemes:
Since xps uses the original Affymetrix CDF, PGF and annotation files,
you have to import these files first. Here is my Rterm session for doing
this for HG-U133_Plus_2:
> library(xps)
Welcome to xps version 1.2.1
an R wrapper for XPS - eXpression Profiling System
(c) Copyright 2001-2008 by Christian Stratowa
> libdir <- "C:/home/Affy/libraryfiles"
> anndir <- "C:/home/Affy/Annotation"
> scmdir <- "C:/home/Rabbitus/CRAN/Workspaces/Schemes"
> scheme.hgu133p2.na27 <-
import.expr.scheme("Scheme_HGU133p2_na27",filedir=scmdir,paste(libdir,"HG-U133_Plus_2.cdf",sep="/"),paste(libdir,"HG-U133-PLUS_probe.tab",sep="/"),paste(anndir,"Version08Nov/HG-U133_Plus_2.na27.annot.csv",sep="/"))
Creating new file
<C:/home/Rabbitus/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root>...
Importing <C:/home/Affy/libraryfiles/HG-U133_Plus_2.cdf> as
<HG-U133_Plus_2.scm>...
<1354896> records imported...Finished
PM/MM statistics:
5 cells with minimum number of PM/MM pairs: 8
1 cells with maximum number of PM/MM pairs: 69
New dataset <HG-U133_Plus_2> is added to Content...
Importing <C:/home/Affy/libraryfiles/HG-U133-PLUS_probe.tab> as
<HG-U133_Plus_2.prb>...
Warning: The following header columns are missing:
<Serial Order>
<604258> records read...Finished
<1354896> records imported...Finished
probe info:
GC content: minimum GC is <3> maximum GC is <22>
Melting Tm: minimum Tm is <51> maximum Tm is <89>
Importing
<C:/home/Affy/Annotation/Version08Nov/HG-U133_Plus_2.na27.annot.csv> as
<HG-U133_Plus_2.ann>...
Warning: The following header columns are missing:
<Protein Families>
<Protein Domains>
Number of annotated transcripts is <54675>.
Warning: Number of transcripts with ambigous annotation is <336>
<54675> records imported...Finished
>
I would recommend to import all necessary schemes and save them in a
common system directory. You need not save this R session since you can
access every scheme in later R sessions with function root.scheme().
Note that with xps_1.2.1 it is no longer necessary to delete the first
12 lines from the annotation file. All warnings can be ignored, they are
caused by changes in the Affymetrix annotation files.
2. Import CEL-files:
To show you that xps can easily handle many CEL-files I have imported
all 53 CEl-files from the Affymetrix human tissue/mix dataset.
Here is the output for RGui:
> library(xps)
Welcome to xps version 1.2.1
an R wrapper for XPS - eXpression Profiling System
(c) Copyright 2001-2008 by Christian Stratowa
> scmdir <- "E:/CRAN/Workspaces/Schemes"
> scmdir <- "E:/CRAN/Workspaces/Schemes"
> celdir <- "E:/ChipData/Exon/HuMixture"
> datdir <- "E:/CRAN/Workspaces/ROOTData"
> scheme.u133p2 <-
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
> Sys.time()
[1] "2008-12-07 14:47:20 CET"
> data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2",
filedir=datdir, celdir=celdir, verbose=FALSE)
> Sys.time()
[1] "2008-12-07 14:53:45 CET"
>
As you see, importing 53 CEL-files takes about 7 min.
Here is the (partial) output when using Rterm:
> library(xps)
Welcome to xps version 1.2.1
an R wrapper for XPS - eXpression Profiling System
(c) Copyright 2001-2008 by Christian Stratowa
> scmdir <- "E:/CRAN/Workspaces/Schemes"
> celdir <- "E:/ChipData/Exon/HuMixture"
> datdir <- "E:/CRAN/Workspaces/ROOTData"
> scheme.u133p2 <-
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
> data.mix <- import.data(scheme.u133p2, "HuMixAllU133P2",
filedir=datdir, celdir=celdir, verbose=TRUE)
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in
<READ> mode...
Creating new file <E:/CRAN/Workspaces/ROOTData/HuTissuesU133P2_cel.root>...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_breast_A.CEL> as
<u1332plus_ivt_breast_A.cel>...
<1354896> records imported...
hybridization statistics:
4 cells with minimal intensity 32
1 cells with maximal intensity 16261
New dataset <DataSet> is added to Content...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_breast_B.CEL> as
<u1332plus_ivt_breast_B.cel>...
<1354896> records imported...
hybridization statistics:
1 cells with minimal intensity 24
1 cells with maximal intensity 20496
...
...
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_thyroid_B.CEL> as
<u1332plus_ivt_thyroid_B.cel>...
<1354896> records imported...
hybridization statistics:
1 cells with minimal intensity 29
1 cells with maximal intensity 47017
Importing <E:/ChipData/Exon/HuMixture/u1332plus_ivt_thyroid_C.CEL> as
<u1332plus_ivt_thyroid_C.cel>...
<1354896> records imported...
hybridization statistics:
1 cells with minimal intensity 24
2 cells with maximal intensity 65534
>
As you see, in Rterm you see the progress status and get some
statistical information. Since CEL-files have often long and strange
names I would recommend to use parameter "celnames" in function
import.data() to use new names. Once again you need not save the R
session since you can access the data in later R sessions using function
root.data().
3. RMA normalization:
RMA normalization of all 53 CEL-files takes about 1 hr.
Here is the RGui session:
> library(xps)
Welcome to xps version 1.2.1
an R wrapper for XPS - eXpression Profiling System
(c) Copyright 2001-2008 by Christian Stratowa
> scmdir <- "E:/CRAN/Workspaces/Schemes"
> scheme.u133p2 <-
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
> datdir <- "E:/CRAN/Workspaces/ROOTData"
> data.u133p2 <- root.data(scheme.u133p2,
paste(datdir,"HuMixAllU133P2_cel.root",sep="/"))
> Sys.time()
[1] "2008-12-07 14:59:12 CET"
> data.rma <-
rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normalize=TRUE,verbose=FALSE)
> Sys.time()
[1] "2008-12-07 15:55:25 CET"
>
In comparison, here is the (partial) Rterm session:
> library(xps)
Welcome to xps version 1.2.1
an R wrapper for XPS - eXpression Profiling System
(c) Copyright 2001-2008 by Christian Stratowa
> scmdir <- "E:/CRAN/Workspaces/Schemes"
> scheme.u133p2 <-
root.scheme(paste(scmdir,"Scheme_HGU133p2_na27.root",sep="/"))
> datdir <- "E:/CRAN/Workspaces/ROOTData"
> data.u133p2 <- root.data(scheme.u133p2,
paste(datdir,"HuMixAllU133P2_cel.root",sep="/"))
> Sys.time()
[1] "2008-12-07 13:32:35 CET"
> data.rma <-
rma(data.u133p2,"MixAllU133P2RMA",tmpdir="",background="pmonly",normalize=TRUE,verbose=TRUE)
Creating new file
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.root>...
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in
<READ> mode...
Opening file <E:/CRAN/Workspaces/ROOTData/HuMixAllU133P2_cel.root> in
<READ> mode...
Preprocessing data using method <preprocess>...
Background correcting raw data...
calculating background for <u1332plus_ivt_breast_A.cel>...
background statistics:
750638 cells with minimal intensity 0
1468 cells with maximal intensity 69.3196
calculating background for <u1332plus_ivt_breast_B.cel>...
background statistics:
750638 cells with minimal intensity 0
1334 cells with maximal intensity 68.3009
...
...
calculating background for <u1332plus_ivt_thyroid_B.cel>...
background statistics:
750638 cells with minimal intensity 0
295 cells with maximal intensity 65.6557
calculating background for <u1332plus_ivt_thyroid_C.cel>...
background statistics:
750638 cells with minimal intensity 0
1 cells with maximal intensity 74.3142
Normalizing raw data...
normalizing data using method <quantile>...
finished filling <53> arrays. ..
finished filling <53> trees. cqu>...
Converting raw data to expression levels...
summarizing with <medianpolish>...
calculating expression for <54675> of <54684> units...Finished.
expression statistics:
minimal expression level is <2.65147>
maximal expression level is <15470.9>
preprocessing finished.
Opening file <E:/CRAN/Workspaces/Schemes/Scheme_HGU133p2_na27.root> in
<READ> mode...
Opening file
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.root> in
<READ> mode...
Exporting data from tree <*> to file
<E:/CRAN/Workspaces/Exon/hutissues/u133p2/MixAllU133P2RMA.txt>...
Reading entries from <HG-U133_Plus_2.ann> ...Finished
<54675> of <54675> records exported.
> Sys.time()
[1] "2008-12-07 14:35:09 CET"
>
Once again, in Rterm you see the progress status and get some
statistical information. I consider it helpful to see the progress
information, especially when computation takes a long time.
I hope that this demonstration could show you how to use xps
successfully, and can help you solving your problems.
Best regards
Christian
cstrato wrote:
> Dear Chris
>
> This is strange, could you please give your sessionInfo(), which
> version of xps, which version of ROOT, which version of R, WinXP or
> Vista?
>
> Could you please give the complete code for creating the scheme.
> I am not sure if it is a good idea to save the "hgu133plu2.root" file
> in the package directory, I would propose to create a directory
> "schemes" somewhere else, e.g. "McMasters/schemes".
>
> Furthermore, could you please set "verbose=TRUE" in the methods and
> start R from the Command Console. Then you will see the progress
> messages. Could you please send me this output, so that I can check
> the result?
>
> Handling 40 CEL-files should not be a problem, one user of xps
> reported that he could successfully handle 500 CEL-files on his
> Windows machine.
>
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a
> V.i.e.n.n.a A.u.s.t.r.i.a
> e.m.a.i.l: cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
>
>
> Christopher N Barnes wrote:
>> All,
>>
>> I am new to xps and am having trouble reading in the cel files.
>>
>> I got the 3 correct files from affymetrix and created a scheme
>> removing the first 12 lines from the annotation file (fix 1)
>>
>>
>> I then read in my scheme:
>> hgu133plus2<-root.scheme(paste(.path.package("xps"),"schemes/hgu133plus2.root",
>>
>> sep="/"))
>>
>> and then try to read in the CEL files.
>> celdir2<-"C:/McMasters/test"
>> data.test3<-import.data(hgu133plus2,"tmp2",celdir=celdir2,
>> verbose=FALSE)
>> It worked 1 time and now causes R to crash. I am trying to read in
>> 40 CEL files 50,000+ genes on a 4G machine.
>>
>> Does anyone have any suggestions of another method to read a large
>> amount of CEL files. If I try using Read Affy() to read in, I don't
>> have the space to allocate.
>> Thanks for the Help,
>>
>> Chris Barnes
>> PhD student University of Louisville
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list