[BioC] What should be the output after processing a cel file?
Peng Yu
pengyu.ut at gmail.com
Sat Jul 25 15:43:57 CEST 2009
On Fri, Jul 24, 2009 at 1:13 PM, James W.
MacDonald<jmacdon at med.umich.edu> wrote:
> Hi Peng,
>
> Peng Yu wrote:
>>
>> On Fri, Jul 24, 2009 at 12:43 PM, James W.
>> MacDonald<jmacdon at med.umich.edu> wrote:
>>>
>>> Hi Peng,
>>>
>>> Peng Yu wrote:
>>>>
>>>> Hi,
>>>>
>>>> I run the following command in R.
>>>>
>>>> library(oligo)
>>>> data<-oligo::read.celfiles("some.cel")
>>>> eset<-rma(data)
>>>> write.exprs(eset, file="some.txt", sep="\t")
>>>>
>>>> It generate the file "some.txt". But I am not sure what it means. The
>>>> content of some.txt is the following.
>>>>
>>>> wt1-mth_HZ_5238_MST1_19385.cel
>>>> 10344615 7.83088386872146
>>>> 10344617 3.13300493228193
>>>> 10344619 3.00984893419684
>>>> 10344621 4.55830890064195
>>>> 10344623 7.79420011157519
>>>> 10344625 8.93864799064523
>>>> 10344626 10.2404135279143
>>>> 10344627 8.36493644804453
>>>> 10344628 10.8239110733786
>>>>
>>>>
>>>> I am wondering if I processed the cel file correctly. What does the
>>>> first column mean?
>>>
>>> The first column is the affy probeset ID. You can use the correct
>>> annotation
>>> package to map these IDs to more conventional IDs, such as Entrez Gene or
>>> Ensembl using the correct .db package. Had you noted what chip this is, I
>>> might have been able to point you to the correct chip. But you can peruse
>>> this webpage to find it:
>>>
>>> http://www.bioconductor.org/packages/release/data/annotation/
>>
>> Hi Jim,
>>
>> It's the Mouse Gene 1.0 ST Array.
>>
>>
>> http://www.affymetrix.com/products_services/arrays/specific/mousegene_1_st.affx
>>
>> The following is the output of my R script. Shall I use
>> 'mogene10stprobeset.db' and 'mogene10sttranscriptcluster.db'?
>>
>> Regards,
>> Peng
>>
>>> library(oligo)
>>
>> Loading required package: oligoClasses
>> Loading required package: Biobase
>>
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material. To view, type
>> 'openVignette()'. To cite Bioconductor, see
>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: preprocessCore
>> Welcome to oligo version 1.8.1
>>>
>>> for (f in c("wt1-mth_HZ_5238_MST1_19385",
>>
>> + "wt2-mth_HZ_5238_MST1_19386",
>> + "wt3-mth_HZ_5238_MST1_19387",
>> + "wt4-mth_HZ_5238_MST1_19388",
>> + "koA-mth_HZ_5238_MST1_19389",
>> + "koB-mth_HZ_5238_MST1_19390",
>> + "koC-mth_HZ_5238_MST1_19391",
>> + "koD-mth_HZ_5238_MST1_19392"
>> + )) {
>> + data<-oligo::read.celfiles(paste(f, ".cel", sep=''))
>> + eset<-rma(data)
>> + write.exprs(eset, file=paste(f, ".txt", sep=''), sep="\t")
>> + }
>
> OK. Seriously. Don't do this. If you got this idea somewhere, please let us
> know where so we can correct that information.
>
> The rma method is designed to work with a set of chips, not one by one. You
> want to do something like this:
>
> dat <- read.celfiles(list.celfiles())
> eset <- rma(dat)
>
> now use something like limma to find differentially expressed genes. Then if
> you want to annotate them, you can use the mogene10stprobeset.db package.
>
> You might seriously consider purchasing this:
>
> http://www.bioconductor.org/pub/docs/mogr/
>
> or finding a local statistician who is familiar with these tools to help
> you.
Hi,
Thank you for your help. I paste the R code and the first 10 lines of
the output at the end of the message. The results are correct, right?
Would you please let me know what command I should use for annotation?
I have the book on BioC, which has a lot of information. I need some
time to absorb all the information. For now, would you please let me
know what parts of the book I should focus on for my application?
Regards,
Peng
library(oligo)
data<-read.celfiles(list.celfiles())
eset<-rma(dat)
eset<-rma(data)
write.exprs(eset, file="output.txt", sep="\t")
koA-mth_HZ_5238_MST1_19389.cel koB-mth_HZ_5238_MST1_19390.cel
koC-mth_HZ_5238_MST1_19391.cel koD-mth_HZ_5238_MST1_19392.cel
wt1-mth_HZ_5238_MST1_19385.cel wt2-mth_HZ_5238_MST1_19386.cel
wt3-mth_HZ_5238_MST1_19387.cel wt4-mth_HZ_5238_MST1_19388.cel
10344615 7.07210987006919 7.01089258722033
7.26426270000726.92980486555595 7.72857978063884
6.91124431275741 7.457761829613277.21025349865986
10344617 3.02519545040591 3.08697023169755
3.032032340858283.09846420636071 3.12487891156704
3.10727683101607 3.0544609560487 3.03353963677405
10344619 3.20294677833793 3.20612630466463
3.176553031536723.13210443165341 3.1378507207366
3.21452663497659 3.313450502242243.09287042099817
10344621 4.70984671316916 4.68863215464979
4.437058573307564.59970839525133 4.66911715996711
4.80422412543456 4.57334787499862 4.60736276830484
10344623 7.79927399492793 7.78057650451938
7.727104168704187.68525205462879 7.66271776323834
7.65761154201622 7.67860029345257 7.80684426781102
10344625 8.43869623252839 9.23986002214653
9.014821817262028.8450593076064 8.59194370149885
9.08344656110017 9.074688130046138.92291936928794
10344626 10.0590964382247 9.75778614016683
9.668744583401899.91560261746937 9.97497585580347
9.90593250683953 9.72513220186519 10.0570156812405
10344627 7.45353674141328 7.85528510695415
7.12399388341447.48673272391552 8.2401362665769 7.24092300626232
7.4348487408975 7.8999935331867
10344628 10.1181530678991 10.2050144957479
10.082132643217510.2014962484731 10.3549307008668
9.97359523972773 9.82152593658235 10.0714458425003
More information about the Bioconductor
mailing list