[BioC] What should be the output after processing a cel file?
James W. MacDonald
jmacdon at med.umich.edu
Fri Jul 24 20:13:32 CEST 2009
Hi Peng,
Peng Yu wrote:
> On Fri, Jul 24, 2009 at 12:43 PM, James W.
> MacDonald<jmacdon at med.umich.edu> wrote:
>> Hi Peng,
>>
>> Peng Yu wrote:
>>> Hi,
>>>
>>> I run the following command in R.
>>>
>>> library(oligo)
>>> data<-oligo::read.celfiles("some.cel")
>>> eset<-rma(data)
>>> write.exprs(eset, file="some.txt", sep="\t")
>>>
>>> It generate the file "some.txt". But I am not sure what it means. The
>>> content of some.txt is the following.
>>>
>>> wt1-mth_HZ_5238_MST1_19385.cel
>>> 10344615 7.83088386872146
>>> 10344617 3.13300493228193
>>> 10344619 3.00984893419684
>>> 10344621 4.55830890064195
>>> 10344623 7.79420011157519
>>> 10344625 8.93864799064523
>>> 10344626 10.2404135279143
>>> 10344627 8.36493644804453
>>> 10344628 10.8239110733786
>>>
>>>
>>> I am wondering if I processed the cel file correctly. What does the
>>> first column mean?
>> The first column is the affy probeset ID. You can use the correct annotation
>> package to map these IDs to more conventional IDs, such as Entrez Gene or
>> Ensembl using the correct .db package. Had you noted what chip this is, I
>> might have been able to point you to the correct chip. But you can peruse
>> this webpage to find it:
>>
>> http://www.bioconductor.org/packages/release/data/annotation/
>
> Hi Jim,
>
> It's the Mouse Gene 1.0 ST Array.
>
> http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx
>
> The following is the output of my R script. Shall I use
> 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'?
>
> Regards,
> Peng
>
>> library(oligo)
> Loading required package: oligoClasses
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: preprocessCore
> Welcome to oligo version 1.8.1
>> for (f in c("wt1-mth_HZ_5238_MST1_19385",
> + "wt2-mth_HZ_5238_MST1_19386",
> + "wt3-mth_HZ_5238_MST1_19387",
> + "wt4-mth_HZ_5238_MST1_19388",
> + "koA-mth_HZ_5238_MST1_19389",
> + "koB-mth_HZ_5238_MST1_19390",
> + "koC-mth_HZ_5238_MST1_19391",
> + "koD-mth_HZ_5238_MST1_19392"
> + )) {
> + data<-oligo::read.celfiles(paste(f, ".cel", sep=''))
> + eset<-rma(data)
> + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t")
> + }
OK. Seriously. Don't do this. If you got this idea somewhere, please let
us know where so we can correct that information.
The rma method is designed to work with a set of chips, not one by one.
You want to do something like this:
dat <- read.celfiles(list.celfiles())
eset <- rma(dat)
now use something like limma to find differentially expressed genes.
Then if you want to annotate them, you can use the mogene10stprobeset.db
package.
You might seriously consider purchasing this:
http://www.bioconductor.org/pub/docs/mogr/
or finding a local statistician who is familiar with these tools to help
you.
Best,
Jim
> Loading required package: pd.mogene.1.0.st.v1
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : wt1-mth_HZ_5238_MST1_19385.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt2-mth_HZ_5238_MST1_19386.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt3-mth_HZ_5238_MST1_19387.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : wt4-mth_HZ_5238_MST1_19388.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koA-mth_HZ_5238_MST1_19389.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koB-mth_HZ_5238_MST1_19390.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koC-mth_HZ_5238_MST1_19391.cel
> Background correcting
> Normalizing
> Calculating Expression
> Platform design info loaded.
> Reading in : koD-mth_HZ_5238_MST1_19392.cel
> Background correcting
> Normalizing
> Calculating Expression
>> proc.time()
> user system elapsed
> 574.095 14.989 595.596
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the Bioconductor
mailing list