[BioC] Problem reading Cel files - Oligo Package

James W. MacDonald jmacdon at uw.edu
Wed Aug 28 16:44:02 CEST 2013


Hi Atul,

On 8/27/2013 11:18 PM, Atul wrote:
> Hi All,
>
> I am trying to read four *.Cel files into oligo and getting this error:
>
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
> > AF_data = read.celfiles(celFiles)
> All the CEL files must be of the same type.
> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>
> Then I tried reading files separately (one by one) and found that one 
> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' 
> while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO 
> and found that though all the samples are from different studies but 
> were generated using same chip - Human Exon 1.0 ST Arrays and the one 
> which is giving error (Iris.cel )have 
> 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing 
> description, that means it is also version2 of HuEx 1.0ST.
>
> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' 
> instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file 
> is read without any problem:
>
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL"
> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
> Platform design info loaded.
> Reading in : Iris.CEL
>
> But if I add other files and try same thing, than the error is back:
> > celFiles <- list.celfiles()
> > celFiles
> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
> All the CEL files must be of the same type.
> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not TRUE
>
>
> Can anybody please tell me why annotation package for Iris.cel which 
> is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as 
> 'pd.huex.1.0.st.v1'? If explicitly mention package name 
> pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if 
> read with other cel files with same annotation (pd.huex.1.0.st.v2) it 
> gives error??

The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that file:

 > sapply(fls, oligo:::getCelChipType, useAffyio=T)
GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
                                      "HuEx-1_0-st-v1"
                                      GSM486433.CEL.gz
                                      "HuEx-1_0-st-v2"

And the others you are trying to read are version 2. It doesn't really 
matter what GEO says, as the information on GEO come from the submitter, 
and they evidently made a mistake.

I don't know what, if any, differences there are between the two 
versions. In addition, there isn't anything I can see on the Affy 
website that says what differences there may be. Certainly they have the 
same number of probes and the probe IDs are all the same. So you can 
combine:

 > fls <- dir(pattern = "CEL.gz")
 > dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2")
Loading required package: pd.huex.1.0.st.v2
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
 > dat2 <- read.celfiles(fls[2]) ## note that you would use all three of 
the other celfiles for this step
Platform design info loaded.
Reading in : GSM486433.CEL.gz
 > dat <- combine(dat1, dat2)
Warning messages:
1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
2: data frame column 'exprs' levels not all.equal
3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
4: data frame column 'dates' levels not all.equal
 > all.equal(featureNames(dat1), featureNames(dat2))
[1] TRUE
 > dat
ExonFeatureSet (storageMode: lockedEnvironment)
assayData: 6553600 features, 2 samples
   element names: exprs
protocolData
   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
     GSM486433.CEL.gz
   varLabels: exprs dates
   varMetadata: labelDescription channel
phenoData
   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
     GSM486433.CEL.gz
   varLabels: index
   varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.huex.1.0.st.v2

You should note however that this isn't a recommendation on my part that 
you should do this. I don't know what these data are, nor what you are 
planning to do with them. In general combining data from two or more 
completely different experiments is a very tricky endeavor. Using 
something like fRMA (if there are frozen estimates for this chip type) 
might be a better way to go.

Best,

Jim


>
> NCBI GEO ID:
> Iris.cel - GSM1008547
> Liv1/2/3 - GSM486433/GSM486434/GSM486435
>
> Awaiting help.
>
> AK
>
>
> Session Info:
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=C LC_NAME=C                  
> LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets 
> methods   base
>
> other attached packages:
> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 
> Biobase_2.20.1          oligoClasses_1.22.0
> [7] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>  [1] affxparser_1.32.3     affyio_1.28.0 BiocInstaller_1.10.1 
> Biostrings_2.28.0     bit_1.1-10 codetools_0.2-8
>  [7] ff_2.2-11             foreach_1.4.0 GenomicRanges_1.12.4 
> IRanges_1.18.1        iterators_1.0.6 preprocessCore_1.22.0
> [13] splines_3.0.1         stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list