[BioC] Problem reading Cel files - Oligo Package
James W. MacDonald
jmacdon at uw.edu
Wed Aug 28 16:44:02 CEST 2013
Hi Atul,
On 8/27/2013 11:18 PM, Atul wrote:
> Hi All,
>
> I am trying to read four *.Cel files into oligo and getting this error:
>
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
> > AF_data = read.celfiles(celFiles)
> All the CEL files must be of the same type.
> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>
> Then I tried reading files separately (one by one) and found that one
> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1'
> while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO
> and found that though all the samples are from different studies but
> were generated using same chip - Human Exon 1.0 ST Arrays and the one
> which is giving error (Iris.cel )have
> 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing
> description, that means it is also version2 of HuEx 1.0ST.
>
> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2'
> instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file
> is read without any problem:
>
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL"
> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
> Platform design info loaded.
> Reading in : Iris.CEL
>
> But if I add other files and try same thing, than the error is back:
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
> All the CEL files must be of the same type.
> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>
>
> Can anybody please tell me why annotation package for Iris.cel which
> is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as
> 'pd.huex.1.0.st.v1'? If explicitly mention package name
> pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if
> read with other cel files with same annotation (pd.huex.1.0.st.v2) it
> gives error??
The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file:
> sapply(fls, oligo:::getCelChipType, useAffyio=T)
GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
"HuEx-1_0-st-v1"
GSM486433.CEL.gz
"HuEx-1_0-st-v2"
And the others you are trying to read are version 2. It doesn't really
matter what GEO says, as the information on GEO come from the submitter,
and they evidently made a mistake.
I don't know what, if any, differences there are between the two
versions. In addition, there isn't anything I can see on the Affy
website that says what differences there may be. Certainly they have the
same number of probes and the probe IDs are all the same. So you can
combine:
> fls <- dir(pattern = "CEL.gz")
> dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2")
Loading required package: pd.huex.1.0.st.v2
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> dat2 <- read.celfiles(fls[2]) ## note that you would use all three of
the other celfiles for this step
Platform design info loaded.
Reading in : GSM486433.CEL.gz
> dat <- combine(dat1, dat2)
Warning messages:
1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
2: data frame column 'exprs' levels not all.equal
3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
4: data frame column 'dates' levels not all.equal
> all.equal(featureNames(dat1), featureNames(dat2))
[1] TRUE
> dat
ExonFeatureSet (storageMode: lockedEnvironment)
assayData: 6553600 features, 2 samples
element names: exprs
protocolData
rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
GSM486433.CEL.gz
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
GSM486433.CEL.gz
varLabels: index
varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.huex.1.0.st.v2
You should note however that this isn't a recommendation on my part that
you should do this. I don't know what these data are, nor what you are
planning to do with them. In general combining data from two or more
completely different experiments is a very tricky endeavor. Using
something like fRMA (if there are frozen estimates for this chip type)
might be a better way to go.
Best,
Jim
>
> NCBI GEO ID:
> Iris.cel - GSM1008547
> Liv1/2/3 - GSM486433/GSM486434/GSM486435
>
> Awaiting help.
>
> AK
>
>
> Session Info:
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
> LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2
> Biobase_2.20.1 oligoClasses_1.22.0
> [7] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1
> Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8
> [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4
> IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0
> [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list