[BioC] Problem reading Cel files - Oligo Package
Atul
atulkakrana at outlook.com
Thu Aug 29 05:21:31 CEST 2013
Hi James,
Many thanks for suggestion. It worked perfectly.
Best
AK
On 08/28/2013 10:44 AM, James W. MacDonald wrote:
> Hi Atul,
>
> On 8/27/2013 11:18 PM, Atul wrote:
>> Hi All,
>>
>> I am trying to read four *.Cel files into oligo and getting this error:
>>
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>> > AF_data = read.celfiles(celFiles)
>> All the CEL files must be of the same type.
>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not
>> TRUE
>>
>> Then I tried reading files separately (one by one) and found that one
>> sample (Iris.CEL) shows annotation package as 'pd.huex.1.0.st.v1'
>> while rest (Liv1,Liv2,Liv3) are 'pd.huex.1.0.st.v2'. I checked on GEO
>> and found that though all the samples are from different studies but
>> were generated using same chip - Human Exon 1.0 ST Arrays and the one
>> which is giving error (Iris.cel )have
>> 'HuEx-1_0-st-v2.r2.dt1.hg18.core.ps' mentioned under data processing
>> description, that means it is also version2 of HuEx 1.0ST.
>>
>> So I explicitly mentioned annotation package 'pd.huex.1.0.st.v2'
>> instead of the one recognized by oligo ('pd.huex.1.0.st.v1') and file
>> is read without any problem:
>>
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL"
>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>> Platform design info loaded.
>> Reading in : Iris.CEL
>>
>> But if I add other files and try same thing, than the error is back:
>> > celFiles <- list.celfiles()
>> > celFiles
>> [1] "Iris.CEL" "Liv1.CEL" "Liv2.CEL" "Liv3.CEL"
>> > AF_data = read.celfiles(celFiles,pkgname='pd.huex.1.0.st.v2')
>> All the CEL files must be of the same type.
>> Error: checkChipTypes(filenames, verbose, "affymetrix", TRUE) is not
>> TRUE
>>
>>
>> Can anybody please tell me why annotation package for Iris.cel which
>> is from HuEx 1.0ST v2 (from NCBI GEO description) is recognized as
>> 'pd.huex.1.0.st.v1'? If explicitly mention package name
>> pd.huex.1.0.st.v2 and try to read Iris.cel alone, it works. But if
>> read with other cel files with same annotation (pd.huex.1.0.st.v2) it
>> gives error??
>
> The Iris.cel file is a HuEx-1_0-st-v1, according to the header in that
> file:
>
> > sapply(fls, oligo:::getCelChipType, useAffyio=T)
> GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> "HuEx-1_0-st-v1"
> GSM486433.CEL.gz
> "HuEx-1_0-st-v2"
>
> And the others you are trying to read are version 2. It doesn't really
> matter what GEO says, as the information on GEO come from the
> submitter, and they evidently made a mistake.
>
> I don't know what, if any, differences there are between the two
> versions. In addition, there isn't anything I can see on the Affy
> website that says what differences there may be. Certainly they have
> the same number of probes and the probe IDs are all the same. So you
> can combine:
>
> > fls <- dir(pattern = "CEL.gz")
> > dat1 <- read.celfiles(fls[1], pkgname="pd.huex.1.0.st.v2")
> Loading required package: pd.huex.1.0.st.v2
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> > dat2 <- read.celfiles(fls[2]) ## note that you would use all three
> of the other celfiles for this step
> Platform design info loaded.
> Reading in : GSM486433.CEL.gz
> > dat <- combine(dat1, dat2)
> Warning messages:
> 1: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
> 2: data frame column 'exprs' levels not all.equal
> 3: In alleq(levels(x[[nm]]), levels(y[[nm]])) : 1 string mismatch
> 4: data frame column 'dates' levels not all.equal
> > all.equal(featureNames(dat1), featureNames(dat2))
> [1] TRUE
> > dat
> ExonFeatureSet (storageMode: lockedEnvironment)
> assayData: 6553600 features, 2 samples
> element names: exprs
> protocolData
> rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> GSM486433.CEL.gz
> varLabels: exprs dates
> varMetadata: labelDescription channel
> phenoData
> rowNames: GSM1008547_02_V-2_Pool-Normal-Iris_11-18-09_S1.CEL.gz
> GSM486433.CEL.gz
> varLabels: index
> varMetadata: labelDescription channel
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: pd.huex.1.0.st.v2
>
> You should note however that this isn't a recommendation on my part
> that you should do this. I don't know what these data are, nor what
> you are planning to do with them. In general combining data from two
> or more completely different experiments is a very tricky endeavor.
> Using something like fRMA (if there are frozen estimates for this chip
> type) might be a better way to go.
>
> Best,
>
> Jim
>
>
>>
>> NCBI GEO ID:
>> Iris.cel - GSM1008547
>> Liv1/2/3 - GSM486433/GSM486434/GSM486435
>>
>> Awaiting help.
>>
>> AK
>>
>>
>> Session Info:
>>
>> > sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> LC_MONETARY=en_US.UTF-8
>> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
>> LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
>> methods base
>>
>> other attached packages:
>> [1] pd.huex.1.0.st.v2_3.8.0 RSQLite_0.11.4 DBI_0.2-7 oligo_1.24.2
>> Biobase_2.20.1 oligoClasses_1.22.0
>> [7] BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] affxparser_1.32.3 affyio_1.28.0 BiocInstaller_1.10.1
>> Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8
>> [7] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.4
>> IRanges_1.18.1 iterators_1.0.6 preprocessCore_1.22.0
>> [13] splines_3.0.1 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list