[BioC] Problem with RMA using pdInfoBuilder, oligo and limma

Thu Jan 29 03:30:55 CET 2009

Hi M Carvalho,

Thank you for your response. I don't know if I give you all information
that you need but I hope that it will help you helping me!

Best regards,

Anne-Marie

> eset <- rma(test)
Background correcting
Normalizing
> e <- exprs(eset)
> design <- model.matrix(~factor(eset$Key)) fit <- lmFit(eset, design) 
> ebayes <- eBayes(fit)
> 
> dim(exprs(test))
[1] 1102500      10
> dim(exprs(eset))
[1] 213068     10
> exprs(test)[1:10, 1:4]
   AD_Ctrl_1.CEL AD_Ctrl_2.CEL AD_Ctrl_3.CEL AD_Ctrl_5.CEL
1           3311          3890          4353          3410
2             58           113            91            85
3           3148          4337          4621          3494
4             63            65            65            95
5             52            44            39            44
6             39            38            79            45
7           3390          4726          4884          3680
8             60            70            61            56
9           3393          4463          4928          3534
10            76            74            96            59
> exprs(eset)[1:10, 1:4]
         AD_Ctrl_1.CEL AD_Ctrl_2.CEL AD_Ctrl_3.CEL AD_Ctrl_5.CEL
10700001     12.030577     12.536941     11.987476     12.038300
10700002      7.071260      7.983521      7.284585      7.142655
10700003     10.593783     11.282184     10.541307     10.634892
10700004      4.997432      5.234236      4.868317      4.805074
10700005      9.432420     10.005787      9.359248      9.435969
10700006      2.009166      2.006136      2.057571      2.036066
10700007      2.358522      2.382384      2.341911      2.454277
10700008      1.851950      1.758534      1.892009      1.723721
10700009      9.402636     10.504029      9.589125      9.475311
10700010      2.929790      3.161541      3.062537      3.026402

## I look at the complete table of p values and found that data for some
IDs are still missing. Those ones corresponding to the "total" gene
expression. Below is an example of those missing IDs. # 10701839,
10701844, 10701846 and 10701853 are ID corresponding to gene symbol and
annotation in the csv file. I confirmed that information after merging
csv info with p.values to then write a table. It is also true with
calculated mean per phenotype and differential expression.

> write.table(data.frame(ebayes$p.value), file="p.value.xls", sep="\t", 
> col.names=TRUE) ebayes$p.value[1800:1813, ]
          (Intercept) factor(eset$Key)Traited
10701838 7.231501e-13               0.1943253
10701840 4.658192e-15               0.8202559
10701841 9.046218e-10               0.4597619
10701842 2.489257e-04               0.1577936
10701843 1.590453e-13               0.1959979
10701845 1.246488e-11               0.7824178
10701847 2.243229e-15               0.2981466
10701848 9.811933e-17               0.4248431
10701849 4.373387e-17               0.7570393
10701850 3.048777e-18               0.6034904
10701851 5.217511e-18               0.2467594
10701852 1.656925e-17               0.7560708
10701854 5.912102e-07               0.2342691
10701855 1.013586e-12               0.5512864
> 
> 
> P <- ebayes$p.value[ ,2]
> Index1 <- 1:5
> Index2 <- 6:10
> 
> d <- (rowMeans(e[,Index1]) - rowMeans(e[,Index2]))
> 
> Mean1 <-  rowMeans(e[,Index1])
> Mean2 <-  rowMeans(e[,Index2])
> 
> csv <- read.csv(file="D:/Anne-Marie/Doctorat/puces ADN 
> macrophages/puces rat/Annie 
> Dube/Analyse/RaGene-1_0-st-v1.na27.rn4.transcript.header.DOS.csv",head
> =TRUE,sep=",")
> csv[1800:1813,1:7]
     transcript_cluster_id   sample seqname strand    start     stop
total_probes
1800              10717201 10717201    chr1      - 20999942 21060567
31
1801              10717233 10717233    chr1      - 21327099 21330215
31
1802              10717240 10717240    chr1      - 21599943 21686654
29
1803              10717256 10717256    chr1      - 21721897 21762024
32
1804              10717268 10717268    chr1      - 21857801 21858925
15
1805              10717270 10717270    chr1      - 21877746 21878812
9
1806              10717272 10717272    chr1      - 21898531 21899607
25
1807              10717274 10717274    chr1      - 21912398 21913474
15
1808              10717276 10717276    chr1      - 21926752 21927828
25
1809              10717278 10717278    chr1      - 21934361 21935437
25
1810              10717280 10717280    chr1      - 21955553 21956629
28
1811              10717282 10717282    chr1      - 21967019 21968095
9
1812              10717284 10717284    chr1      - 21996992 21998005
25	
1813              10717286 10717286    chr1      - 22008118 22009161
25
> sample <- row.names(ebayes)
> all <- merge(csv, P, by.x="sample", by.y=0, all=T)
> all2 <- merge(all, d, by.x="sample", by.y=0, all=T) 
> write.table(data.frame(all2), file="totalgene.xls", sep="\t", 
> col.names = NA)
> 
> dim(all)
[1] 240410     19
>
> dim(all2)
[1] 240410     20
> 
> class(all2)
[1] "data.frame"
> 
> Econtrol <- data.frame(rowMeans(e[,Index1]))
> dim(Econtrol)
[1] 213068      1
> Etraited <- data.frame(rowMeans(e[,Index2]))
> dim(Etraited)
[1] 213068      1
> DiffExp <- data.frame(d)
> dim(DiffExp)
[1] 213068      1
> 
> dim(csv)
[1] 29215    18
>

-----Message d'origine-----
De : Benilton Carvalho [mailto:bcarvalh at jhsph.edu] 
Envoyé : Wednesday, January 28, 2009 10:36 AM
À : Anne-Marie Madore
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] Problem with RMA using pdInfoBuilder, oligo and limma

Hi Anne,

thank you for your report.

Although the message "Calculating Expression" is not shown, I'm  
positive it's being computed. What I mean is that you *are* getting  
the 3 steps.

Can you, please, send us specific information about the objects?

For example:

dim(exprs(test))
dim(exprs(eset))
exprs(test)[1:10, 1:4]
exprs(eset)[1:10, 1:4]

Waitiing to hear from you,

best,

b

On Jan 28, 2009, at 12:58 PM, Anne-Marie Madore wrote:

> Hi,
>
>
> I am a Ph.D. student from Quebec, Canada. I'm a beginner with R and
> Bioconductor. Until now the only experience I have is in analyzing
> microarray data using affy and limma packages. Now I'm trying to  
> analyze
> Rat Gene 10 st arrays and I would like to run RMA analysis and Smyth
> moderated t test on those arrays. Since no cdf official package is
> available for those arrays, after reading many of the questions and
> responses on this mailing list, I decided to use pdInfoBuilder, oligo
> and limma packages to run analysis.
>
> I built my package according to info available at this web site:
>
http://article.gmane.org/gmane.science.biology.informatics.conductor/189
> 63/match=oligo+s
> Below is the output of the installation.
>
> The problem is, after running RMA and moderated T-test, I get  
> expression
> and differential expression measured for all probe separately but not
> the calculated expression for all probes representing a gene. When I  
> run
> RMA, I got only two steps, Background correcting and Normalizing but  
> not
> Calculating expression. Do you know how I can get differential
> expression calculated for each gene? I don't know if the problem is in
> the package I built or if I can use some code to answer this question.
> Below I list warning messages I have when I check my package after
> installation with R CMD check and codes used to analyze expression
> arrays.
>
> Many thanks for your help,
>
> Anne-Marie Madore
> Universite Laval/UQAC
>
>
> ## installing the package in cmd command shell
>
> c:\Program Files\R\R-2.8.1\bin>R CMD INSTALL pd.ragene.1.0.st.v1
> installing to 'c:/PROGRA~1/R/R-28~1.1/library'
>
>
> ---------- Making package pd.ragene.1.0.st.v1 ------------
>  adding build stamp to DESCRIPTION
>  installing NAMESPACE file and metadata
>  installing R files
>  installing inst files
>  preparing package pd.ragene.1.0.st.v1 for lazy loading
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: oligoClasses
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
>  Vignettes contain introductory material. To view, type
>  'openVignette()'. To cite Bioconductor, see
>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>  no man files in this package
>  installing indices
>  installing help
>  adding MD5 sums
>
> * DONE (pd.ragene.1.0.st.v1)
>
>
>
> ## If I run a check (R CMD check pd.ragene.st.v1) I get three warning
> messages and one note:
>
> 1.       * checking R files for non-ASCII characters ... WARNING
> Found the following files with non-ASCII characters: all.R Portable
> packages must use only ASCII characters in their R code, except  
> perhaps
> in comments.
>
> 2.       * checking whether the name space can be loaded with stated
> dependencies ... WARNING
> Error in initDbConnection() : could not find function "dbConnect"  
> Error:
> .onLoad failed in 'loadNamespace' for 'pd.ragene.1.0.st.v1' Execution
> halted A namespace must be able to be loaded with just the base
> namespace loaded:
> otherwise if the namespace gets loaded by a saved object, the session
> will be unable to start.
> Probably some imports need to be declared in the NAMESPACE file.
>
> 3.       * checking R code for possible problems ... NOTE
> closeDb: no visible binding for global variable 'dbCon'
>
> 4.       * checking for missing documentation entries ... WARNING
> Undocumented code objects:
> pd.ragene.1.0.st.v1
> All user-level objects in a package should have documentation entries.
> See the chapter 'Writing R documentation files' in manual 'Writing R
> Extensions'.
>
>
> ## analyzing the package
>
>> library(Biobase)
> Loading required package: tools
>
> Welcome to Bioconductor
>
>  Vignettes contain introductory material. To view, type
>  'openVignette()'. To cite Bioconductor, see
>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>> library(pdInfoBuilder)
> Loading required package: RSQLite
> Loading required package: DBI
> Loading required package: affxparser
> Loading required package: oligo
> Loading required package: splines
> Loading required package: preprocessCore
> Loading required package: AnnotationDbi
> Loading required package: oligoClasses
> oligo Package - Series 1.5.x
>> setwd("D:/Anne-Marie/Doctorat/puces ADN macrophages/puces rat/Annie
> Dube/Analyse")
>> library("pd.ragene.1.0.st.v1")
>> library(oligo)
>> library(limma)
>> library(genefilter)
> Loading required package: survival
>> library(geneplotter)
> Loading required package: lattice
> Loading required package: annotate
> Loading required package: xtable
> KernSmooth 2.22 installed
> Copyright M. P. Wand 1997
>> cel.files <- list.celfiles(".", full.names = TRUE)
>> basename(cel.files)
> [1] "AD_Ctrl_1.CEL"    "AD_Ctrl_2.CEL"    "AD_Ctrl_3.CEL"
> "AD_Ctrl_5.CEL"
> [5] "AD_Ctrl_6.CEL"    "AD_Traite_10.CEL" "AD_Traite_11.CEL"
> "AD_Traite_7.CEL"
> [9] "AD_Traite_8.CEL"  "AD_Traite_9.CEL"
>> test <- read.celfiles(cel.files)
> Platform design info loaded.
>> phenoData(test) <- read.AnnotatedDataFrame("phenoData.txt", header =
> TRUE, row.name=1)
>> class(test)
> [1] "GeneFeatureSet"
> attr(,"package")
> [1] "oligoClasses"
>> class(phenoData)
> [1] "standardGeneric"
> attr(,"package")
> [1] "methods"
>> eset <- rma(test)
> Background correcting
> Normalizing
>> e <- exprs(eset)
>> design <- model.matrix(~factor(eset$Key))
>> fit <- lmFit(eset, design)
>> ebayes <- eBayes(fit)
>>
>>
> ## also a try with slightly the same method as used to make the link
> between phenol data and .CEL with affy package
>
>> pd <- read.AnnotatedDataFrame("pheno.txt", header = TRUE, row.names =
> 1)
>> pData(pd)
>                 Phenotype
> AD_Ctrl_1.CEL      Control
> AD_Ctrl_2.CEL      Control
> AD_Ctrl_3.CEL      Control
> AD_Ctrl_5.CEL      Control
> AD_Ctrl_6.CEL      Control
> AD_Traite_7.CEL    Traited
> AD_Traite_8.CEL    Traited
> AD_Traite_9.CEL    Traited
> AD_Traite_10.CEL   Traited
> AD_Traite_11.CEL   Traited
>> a <- read.celfiles(filenames = rownames(pData(pd)), phenoData = pd,
> verbose = TRUE)
> Platform design info loaded.
>> eset <- rma(a)
> Background correcting
> Normalizing
>> exprs.eset <- exprs(eset)
>> population.groups <- factor (c(rep("Control",5), rep ("Traited",5)))
>> design <- model.matrix (~population.groups)
>> fit <- lmFit (eset, design)
>> fit.eBayes <- eBayes (fit)
>>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils      
> datasets
> methods   base
>
> other attached packages:
> [1] geneplotter_1.20.0        annotate_1.20.1           xtable_1.5-4
>
> [4] lattice_0.17-17           genefilter_1.22.0          
> survival_2.34-1
>
> [7] limma_2.16.3              pd.ragene.1.0.st.v1_0.0.1
> pdInfoBuilder_1.6.0
> [10] oligo_1.6.0               oligoClasses_1.4.0
> AnnotationDbi_1.4.2
> [13] preprocessCore_1.4.0      affxparser_1.14.2         RSQLite_0.7-1
>
> [16] DBI_0.2-4                 Biobase_2.2.1
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.1         KernSmooth_2.22-22 RColorBrewer_1.0-2
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor