[BioC] Problem reading Cel files - Oligo Package

Atul atulkakrana at outlook.com
Thu Aug 29 05:21:31 CEST 2013


Hi James,

Many thanks for suggestion. It worked perfectly.

Best

AK


On 08/28/2013 10:44 AM, James W. MacDonald wrote:
> Hi Atul,
>
> On 8/27/2013 11:18 PM, Atul wrote:
>> Hi All,
>>
>> I am trying to read four *.Cel files into oligo and getting this error:
>>
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>> > AF_data = read.celfiles(celFiles)
>> All the CEL files must be of the same type.
>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not 
>> TRUE
>>
>> Then I tried reading files separately (one by one) and found that one 
>> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1' 
>> while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO 
>> and found that though all the samples are from different studies but 
>> were generated using same chip - Human Exon 1.0 ST Arrays and the one 
>> which is giving error (Iris.cel )have 
>> 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing 
>> description, that means it is also version2 of HuEx 1.0ST.
>>
>> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2' 
>> instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file 
>> is read without any problem:
>>
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL"
>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>> Platform design info loaded.
>> Reading in : Iris.CEL
>>
>> But if I add other files and try same thing, than the error is back:
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>> All the CEL files must be of the same type.
>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not 
>> TRUE
>>
>>
>> Can anybody please tell me why annotation package for Iris.cel which 
>> is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as 
>> 'pd.huex.1.0.st.v1'? If explicitly mention package name 
>> pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if 
>> read with other cel files with same annotation (pd.huex.1.0.st.v2) it 
>> gives error??
>
> The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that 
> file:
>
> > sapply(fls, oligo:::getCelChipType, useAffyio=T)
> GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>                                      "HuEx-1_0-st-v1"
>                                      GSM486433.CEL.gz
>                                      "HuEx-1_0-st-v2"
>
> And the others you are trying to read are version 2. It doesn't really 
> matter what GEO says, as the information on GEO come from the 
> submitter, and they evidently made a mistake.
>
> I don't know what, if any, differences there are between the two 
> versions. In addition, there isn't anything I can see on the Affy 
> website that says what differences there may be. Certainly they have 
> the same number of probes and the probe IDs are all the same. So you 
> can combine:
>
> > fls <- dir(pattern = "CEL.gz")
> > dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2")
> Loading required package: pd.huex.1.0.st.v2
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> > dat2 <- read.celfiles(fls[2]) ## note that you would use all three 
> of the other celfiles for this step
> Platform design info loaded.
> Reading in : GSM486433.CEL.gz
> > dat <- combine(dat1, dat2)
> Warning messages:
> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
> 2: data frame column 'exprs' levels not all.equal
> 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
> 4: data frame column 'dates' levels not all.equal
> > all.equal(featureNames(dat1), featureNames(dat2))
> [1] TRUE
> > dat
> ExonFeatureSet (storageMode: lockedEnvironment)
> assayData: 6553600 features, 2 samples
>   element names: exprs
> protocolData
>   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>     GSM486433.CEL.gz
>   varLabels: exprs dates
>   varMetadata: labelDescription channel
> phenoData
>   rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
>     GSM486433.CEL.gz
>   varLabels: index
>   varMetadata: labelDescription channel
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: pd.huex.1.0.st.v2
>
> You should note however that this isn't a recommendation on my part 
> that you should do this. I don't know what these data are, nor what 
> you are planning to do with them. In general combining data from two 
> or more completely different experiments is a very tricky endeavor. 
> Using something like fRMA (if there are frozen estimates for this chip 
> type) might be a better way to go.
>
> Best,
>
> Jim
>
>
>>
>> NCBI GEO ID:
>> Iris.cel - GSM1008547
>> Liv1/2/3 - GSM486433/GSM486434/GSM486435
>>
>> Awaiting help.
>>
>> AK
>>
>>
>> Session Info:
>>
>> > sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
>> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 
>> LC_MONETARY=en_US.UTF-8
>>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=C LC_NAME=C                  
>> LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets 
>> methods   base
>>
>> other attached packages:
>> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2 
>> Biobase_2.20.1          oligoClasses_1.22.0
>> [7] BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>>  [1] affxparser_1.32.3     affyio_1.28.0 BiocInstaller_1.10.1 
>> Biostrings_2.28.0     bit_1.1-10 codetools_0.2-8
>>  [7] ff_2.2-11             foreach_1.4.0 GenomicRanges_1.12.4 
>> IRanges_1.18.1        iterators_1.0.6 preprocessCore_1.22.0
>> [13] splines_3.0.1         stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list