[BioC] Problem with RMA using pdInfoBuilder, oligo and limma
Anne-Marie Madore
anne-marie.madore.1 at ulaval.ca
Thu Jan 29 03:30:55 CET 2009
Hi M Carvalho,
Thank you for your response. I don't know if I give you all information
that you need but I hope that it will help you helping me!
Best regards,
Anne-Marie
> eset <- rma(test)
Background correcting
Normalizing
> e <- exprs(eset)
> design <- model.matrix(~factor(eset$Key)) fit <- lmFit(eset, design)
> ebayes <- eBayes(fit)
>
> dim(exprs(test))
[1] 1102500 10
> dim(exprs(eset))
[1] 213068 10
> exprs(test)[1:10, 1:4]
AD_Ctrl_1.CEL AD_Ctrl_2.CEL AD_Ctrl_3.CEL AD_Ctrl_5.CEL
1 3311 3890 4353 3410
2 58 113 91 85
3 3148 4337 4621 3494
4 63 65 65 95
5 52 44 39 44
6 39 38 79 45
7 3390 4726 4884 3680
8 60 70 61 56
9 3393 4463 4928 3534
10 76 74 96 59
> exprs(eset)[1:10, 1:4]
AD_Ctrl_1.CEL AD_Ctrl_2.CEL AD_Ctrl_3.CEL AD_Ctrl_5.CEL
10700001 12.030577 12.536941 11.987476 12.038300
10700002 7.071260 7.983521 7.284585 7.142655
10700003 10.593783 11.282184 10.541307 10.634892
10700004 4.997432 5.234236 4.868317 4.805074
10700005 9.432420 10.005787 9.359248 9.435969
10700006 2.009166 2.006136 2.057571 2.036066
10700007 2.358522 2.382384 2.341911 2.454277
10700008 1.851950 1.758534 1.892009 1.723721
10700009 9.402636 10.504029 9.589125 9.475311
10700010 2.929790 3.161541 3.062537 3.026402
## I look at the complete table of p values and found that data for some
IDs are still missing. Those ones corresponding to the "total" gene
expression. Below is an example of those missing IDs. # 10701839,
10701844, 10701846 and 10701853 are ID corresponding to gene symbol and
annotation in the csv file. I confirmed that information after merging
csv info with p.values to then write a table. It is also true with
calculated mean per phenotype and differential expression.
> write.table(data.frame(ebayes$p.value), file="p.value.xls", sep="\t",
> col.names=TRUE) ebayes$p.value[1800:1813, ]
(Intercept) factor(eset$Key)Traited
10701838 7.231501e-13 0.1943253
10701840 4.658192e-15 0.8202559
10701841 9.046218e-10 0.4597619
10701842 2.489257e-04 0.1577936
10701843 1.590453e-13 0.1959979
10701845 1.246488e-11 0.7824178
10701847 2.243229e-15 0.2981466
10701848 9.811933e-17 0.4248431
10701849 4.373387e-17 0.7570393
10701850 3.048777e-18 0.6034904
10701851 5.217511e-18 0.2467594
10701852 1.656925e-17 0.7560708
10701854 5.912102e-07 0.2342691
10701855 1.013586e-12 0.5512864
>
>
> P <- ebayes$p.value[ ,2]
> Index1 <- 1:5
> Index2 <- 6:10
>
> d <- (rowMeans(e[,Index1]) - rowMeans(e[,Index2]))
>
> Mean1 <- rowMeans(e[,Index1])
> Mean2 <- rowMeans(e[,Index2])
>
> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN
> macrophages/puces rat/Annie
> Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head
> =TRUE,sep=",")
> csv[1800:1813,1:7]
transcript_cluster_id sample seqname strand start stop
total_probes
1800 10717201 10717201 chr1 - 20999942 21060567
31
1801 10717233 10717233 chr1 - 21327099 21330215
31
1802 10717240 10717240 chr1 - 21599943 21686654
29
1803 10717256 10717256 chr1 - 21721897 21762024
32
1804 10717268 10717268 chr1 - 21857801 21858925
15
1805 10717270 10717270 chr1 - 21877746 21878812
9
1806 10717272 10717272 chr1 - 21898531 21899607
25
1807 10717274 10717274 chr1 - 21912398 21913474
15
1808 10717276 10717276 chr1 - 21926752 21927828
25
1809 10717278 10717278 chr1 - 21934361 21935437
25
1810 10717280 10717280 chr1 - 21955553 21956629
28
1811 10717282 10717282 chr1 - 21967019 21968095
9
1812 10717284 10717284 chr1 - 21996992 21998005
25
1813 10717286 10717286 chr1 - 22008118 22009161
25
> sample <- row.names(ebayes)
> all <- merge(csv, P, by.x="sample", by.y=0, all=T)
> all2 <- merge(all, d, by.x="sample", by.y=0, all=T)
> write.table(data.frame(all2), file="totalgene.xls", sep="\t",
> col.names = NA)
>
> dim(all)
[1] 240410 19
>
> dim(all2)
[1] 240410 20
>
> class(all2)
[1] "data.frame"
>
> Econtrol <- data.frame(rowMeans(e[,Index1]))
> dim(Econtrol)
[1] 213068 1
> Etraited <- data.frame(rowMeans(e[,Index2]))
> dim(Etraited)
[1] 213068 1
> DiffExp <- data.frame(d)
> dim(DiffExp)
[1] 213068 1
>
> dim(csv)
[1] 29215 18
>
-----Message d'origine-----
De : Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
Envoyé : Wednesday, January 28, 2009 10:36 AM
À : Anne-Marie Madore
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] Problem with RMA using pdInfoBuilder, oligo and limma
Hi Anne,
thank you for your report.
Although the message "Calculating Expression" is not shown, I'm
positive it's being computed. What I mean is that you *are* getting
the 3 steps.
Can you, please, send us specific information about the objects?
For example:
dim(exprs(test))
dim(exprs(eset))
exprs(test)[1:10, 1:4]
exprs(eset)[1:10, 1:4]
Waitiing to hear from you,
best,
b
On Jan 28, 2009, at 12:58 PM, Anne-Marie Madore wrote:
> Hi,
>
>
> I am a Ph.D. student from Quebec, Canada. I'm a beginner with R and
> Bioconductor. Until now the only experience I have is in analyzing
> microarray data using affy and limma packages. Now I'm trying to
> analyze
> Rat Gene 10 st arrays and I would like to run RMA analysis and Smyth
> moderated t test on those arrays. Since no cdf official package is
> available for those arrays, after reading many of the questions and
> responses on this mailing list, I decided to use pdInfoBuilder, oligo
> and limma packages to run analysis.
>
> I built my package according to info available at this web site:
>
http://article.gmane.org/gmane.science.biology.informatics.conductor/189
> 63/match=oligo+s
> Below is the output of the installation.
>
> The problem is, after running RMA and moderated T-test, I get
> expression
> and differential expression measured for all probe separately but not
> the calculated expression for all probes representing a gene. When I
> run
> RMA, I got only two steps, Background correcting and Normalizing but
> not
> Calculating expression. Do you know how I can get differential
> expression calculated for each gene? I don't know if the problem is in
> the package I built or if I can use some code to answer this question.
> Below I list warning messages I have when I check my package after
> installation with R CMD check and codes used to analyze expression
> arrays.
>
> Many thanks for your help,
>
> Anne-Marie Madore
> Universite Laval/UQAC
>
>
> ## installing the package in cmd command shell
>
> c:\Program Files\R\R-2.8.1\bin>R CMD INSTALL pd.ragene.1.0.st.v1
> installing to 'c:/PROGRA~1/R/R-28~1.1/library'
>
>
> ---------- Making package pd.ragene.1.0.st.v1 ------------
> adding build stamp to DESCRIPTION
> installing NAMESPACE file and metadata
> installing R files
> installing inst files
> preparing package pd.ragene.1.0.st.v1 for lazy loading
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: oligoClasses
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> no man files in this package
> installing indices
> installing help
> adding MD5 sums
>
> * DONE (pd.ragene.1.0.st.v1)
>
>
>
> ## If I run a check (R CMD check pd.ragene.st.v1) I get three warning
> messages and one note:
>
> 1. * checking R files for non-ASCII characters ... WARNING
> Found the following files with non-ASCII characters: all.R Portable
> packages must use only ASCII characters in their R code, except
> perhaps
> in comments.
>
> 2. * checking whether the name space can be loaded with stated
> dependencies ... WARNING
> Error in initDbConnection() : could not find function "dbConnect"
> Error:
> .onLoad failed in 'loadNamespace' for 'pd.ragene.1.0.st.v1' Execution
> halted A namespace must be able to be loaded with just the base
> namespace loaded:
> otherwise if the namespace gets loaded by a saved object, the session
> will be unable to start.
> Probably some imports need to be declared in the NAMESPACE file.
>
> 3. * checking R code for possible problems ... NOTE
> closeDb: no visible binding for global variable 'dbCon'
>
> 4. * checking for missing documentation entries ... WARNING
> Undocumented code objects:
> pd.ragene.1.0.st.v1
> All user-level objects in a package should have documentation entries.
> See the chapter 'Writing R documentation files' in manual 'Writing R
> Extensions'.
>
>
> ## analyzing the package
>
>> library(Biobase)
> Loading required package: tools
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>> library(pdInfoBuilder)
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: affxparser
> Loading required package: oligo
> Loading required package: splines
> Loading required package: preprocessCore
> Loading required package: AnnotationDbi
> Loading required package: oligoClasses
> oligo Package - Series 1.5.x
>> setwd("D:/Anne-Marie/Doctorat/puces ADN macrophages/puces rat/Annie
> Dube/Analyse")
>> library("pd.ragene.1.0.st.v1")
>> library(oligo)
>> library(limma)
>> library(genefilter)
> Loading required package: survival
>> library(geneplotter)
> Loading required package: lattice
> Loading required package: annotate
> Loading required package: xtable
> KernSmooth 2.22 installed
> Copyright M. P. Wand 1997
>> cel.files <- list.celfiles(".", full.names = TRUE)
>> basename(cel.files)
> [1] "AD_Ctrl_1.CEL" "AD_Ctrl_2.CEL" "AD_Ctrl_3.CEL"
> "AD_Ctrl_5.CEL"
> [5] "AD_Ctrl_6.CEL" "AD_Traite_10.CEL" "AD_Traite_11.CEL"
> "AD_Traite_7.CEL"
> [9] "AD_Traite_8.CEL" "AD_Traite_9.CEL"
>> test <- read.celfiles(cel.files)
> Platform design info loaded.
>> phenoData(test) <- read.AnnotatedDataFrame("phenoData.txt", header =
> TRUE, row.name=1)
>> class(test)
> [1] "GeneFeatureSet"
> attr(,"package")
> [1] "oligoClasses"
>> class(phenoData)
> [1] "standardGeneric"
> attr(,"package")
> [1] "methods"
>> eset <- rma(test)
> Background correcting
> Normalizing
>> e <- exprs(eset)
>> design <- model.matrix(~factor(eset$Key))
>> fit <- lmFit(eset, design)
>> ebayes <- eBayes(fit)
>>
>>
> ## also a try with slightly the same method as used to make the link
> between phenol data and .CEL with affy package
>
>> pd <- read.AnnotatedDataFrame("pheno.txt", header = TRUE, row.names =
> 1)
>> pData(pd)
> Phenotype
> AD_Ctrl_1.CEL Control
> AD_Ctrl_2.CEL Control
> AD_Ctrl_3.CEL Control
> AD_Ctrl_5.CEL Control
> AD_Ctrl_6.CEL Control
> AD_Traite_7.CEL Traited
> AD_Traite_8.CEL Traited
> AD_Traite_9.CEL Traited
> AD_Traite_10.CEL Traited
> AD_Traite_11.CEL Traited
>> a <- read.celfiles(filenames = rownames(pData(pd)), phenoData = pd,
> verbose = TRUE)
> Platform design info loaded.
>> eset <- rma(a)
> Background correcting
> Normalizing
>> exprs.eset <- exprs(eset)
>> population.groups <- factor (c(rep("Control",5), rep ("Traited",5)))
>> design <- model.matrix (~population.groups)
>> fit <- lmFit (eset, design)
>> fit.eBayes <- eBayes (fit)
>>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils
> datasets
> methods base
>
> other attached packages:
> [1] geneplotter_1.20.0 annotate_1.20.1 xtable_1.5-4
>
> [4] lattice_0.17-17 genefilter_1.22.0
> survival_2.34-1
>
> [7] limma_2.16.3 pd.ragene.1.0.st.v1_0.0.1
> pdInfoBuilder_1.6.0
> [10] oligo_1.6.0 oligoClasses_1.4.0
> AnnotationDbi_1.4.2
> [13] preprocessCore_1.4.0 affxparser_1.14.2 RSQLite_0.7-1
>
> [16] DBI_0.2-4 Biobase_2.2.1
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.1 KernSmooth_2.22-22 RColorBrewer_1.0-2
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list