[BioC] What should be the output after processing a cel file?

Fri Jul 24 20:13:32 CEST 2009

Hi Peng,

Peng Yu wrote:
> On Fri, Jul 24, 2009 at 12:43 PM, James W.
> MacDonald<jmacdon at med.umich.edu> wrote:
>> Hi Peng,
>>
>> Peng Yu wrote:
>>> Hi,
>>>
>>> I run the following command in R.
>>>
>>> library(oligo)
>>> data<-oligo::read.celfiles("some.cel")
>>> eset<-rma(data)
>>> write.exprs(eset, file="some.txt", sep="\t")
>>>
>>> It generate the file "some.txt". But I am not sure what it means. The
>>> content of some.txt is the following.
>>>
>>>        wt1-mth_HZ_5238_MST1_19385.cel
>>> 10344615        7.83088386872146
>>> 10344617        3.13300493228193
>>> 10344619        3.00984893419684
>>> 10344621        4.55830890064195
>>> 10344623        7.79420011157519
>>> 10344625        8.93864799064523
>>> 10344626        10.2404135279143
>>> 10344627        8.36493644804453
>>> 10344628        10.8239110733786
>>>
>>>
>>> I am wondering if I processed the cel file correctly. What does the
>>> first column mean?
>> The first column is the affy probeset ID. You can use the correct annotation
>> package to map these IDs to more conventional IDs, such as Entrez Gene or
>> Ensembl using the correct .db package. Had you noted what chip this is, I
>> might have been able to point you to the correct chip. But you can peruse
>> this webpage to find it:
>>
>> http://www.bioconductor.org/packages/release/data/annotation/
> 
> Hi Jim,
> 
> It's the Mouse Gene 1.0 ST Array.
> 
> http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx
> 
> The following is the output of my R script. Shall I use
> 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'?
> 
> Regards,
> Peng
> 
>> library(oligo)
> Loading required package: oligoClasses
> Loading required package: Biobase
> 
> Welcome to Bioconductor
> 
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
> Loading required package: preprocessCore
> Welcome to oligo version 1.8.1
>> for (f in c("wt1-mth_HZ_5238_MST1_19385",
> +  "wt2-mth_HZ_5238_MST1_19386",
> +  "wt3-mth_HZ_5238_MST1_19387",
> +  "wt4-mth_HZ_5238_MST1_19388",
> +  "koA-mth_HZ_5238_MST1_19389",
> +  "koB-mth_HZ_5238_MST1_19390",
> +  "koC-mth_HZ_5238_MST1_19391",
> +  "koD-mth_HZ_5238_MST1_19392"
> + )) {
> + data<-oligo::read.celfiles(paste(f, ".cel", sep=''))
> + eset<-rma(data)
> + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t")
> + }

OK. Seriously. Don't do this. If you got this idea somewhere, please let 
us know where so we can correct that information.

The rma method is designed to work with a set of chips, not one by one. 
You want to do something like this:

dat <- read.celfiles(list.celfiles())
eset <- rma(dat)

now use something like limma to find differentially expressed genes. 
Then if you want to annotate them, you can use the mogene10stprobeset.db 
package.

You might seriously consider purchasing this:

http://www.bioconductor.org/pub/docs/mogr/

or finding a local statistician who is familiar with these tools to help 
you.

Best,

Jim

> Loading required package: pd.mogene.1.0.st.v1
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : wt1-mth_HZ_5238_MST1_19385.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt2-mth_HZ_5238_MST1_19386.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt3-mth_HZ_5238_MST1_19387.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt4-mth_HZ_5238_MST1_19388.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koA-mth_HZ_5238_MST1_19389.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koB-mth_HZ_5238_MST1_19390.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koC-mth_HZ_5238_MST1_19391.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koD-mth_HZ_5238_MST1_19392.cel
> Background correcting
> Normalizing
> Calculating Expression
>> proc.time()
>    user  system elapsed
> 574.095  14.989 595.596
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826