[BioC] inSilicoMerging and Limma
P.D. Moerland
p.d.moerland at amc.uva.nl
Fri Jun 7 21:10:26 CEST 2013
Dear Ed,
This behaviour is indeed expected when you merge datasets as you did. The two datasets are completely confounded with your contrast of interest: Hsu corresponds to the EXP condition and Kai to the CTL condition. Using ComBat to merge the two datasets, you actually took care that the mean expression of each gene is equal between the two studies. This explains the log fold changes that are nearly zero in your top table. Because of the confounding, there is no sensible way of merging these datasets.
best,
Perry
---
Perry Moerland, PhD
Room J1B-215
Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics
Academic Medical Centre, University of Amsterdam
Postbus 22660, 1100 DD Amsterdam, The Netherlands
tel: +31 20 5666945
p.d.moerland at amc.uva.nl, http://www.bioinformaticslaboratory.nl/
-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Ed Siefker
Sent: Friday, June 07, 2013 8:52 PM
To: bioconductor
Subject: Re: [BioC] inSilicoMerging and Limma
I think the problem is with lmFit(). The merged eset definitely contain different data for each samples, since the MDS plot shows data points scattered around the origin. But after feeding this eset to lmFit(), here's what I get:
*****************************************************
> fit
An object of class "MArrayLM"
$coefficients
CTL EXPvsCTL
1007_s_at 7.045851 1.487227e-15
1053_at 5.498793 7.281216e-16
117_at 5.047563 -8.262373e-16
121_at 4.350444 4.131187e-16
1255_g_at 2.257968 -2.780886e-16
22272 more rows ...
$rank
[1] 2
$assign
NULL
$qr
$qr
CTL EXPvsCTL
[1,] -6.0000000 -4.33333333
[2,] 0.1666667 -2.68741925
[3,] 0.1666667 0.08859624
[4,] 0.1666667 0.08859624
[5,] 0.1666667 0.08859624
31 more rows ...
$qraux
[1] 1.166667 1.088596
$pivot
[1] 1 2
$tol
[1] 1e-07
$rank
[1] 2
$df.residual
[1] 34 34 34 34 34
22272 more elements ...
$sigma
1007_s_at 1053_at 117_at 121_at 1255_g_at
0.954188915 0.174833541 1.082247491 0.346694394 0.001783684
22272 more elements ...
$cov.coefficients
CTL EXPvsCTL
CTL 0.1 -0.1000000
EXPvsCTL -0.1 0.1384615
$stdev.unscaled
CTL EXPvsCTL
1007_s_at 0.3162278 0.3721042
1053_at 0.3162278 0.3721042
117_at 0.3162278 0.3721042
121_at 0.3162278 0.3721042
1255_g_at 0.3162278 0.3721042
22272 more rows ...
$pivot
[1] 1 2
$Amean
1007_s_at 1053_at 117_at 121_at 1255_g_at
7.045851 5.498793 5.047563 4.350444 2.257968
22272 more elements ...
$method
[1] "ls"
$design
CTL EXPvsCTL
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
31 more rows ...
*****************************************************
For some reason fit$stdev.unscaled gives the same values for every probe.
On Fri, Jun 7, 2013 at 12:52 PM, Ed Siefker <ebs15242 at gmail.com> wrote:
> I am trying to compare find differentially expressed genes between
> appendix and colon tumor samples, which have been arrayed on different
> platforms. Namely hgu133a2 and hgu133plus2.
> hgu133a2 is a subset of hgu133plus2, and Bioconductor provides a
> package, inSilicoMerging, that's supposed to do this, so I thought it
> would be straight forward.
>
> First read in my CEL files and normalize them:
>
> > targets_hsu <- readTargets("Hsu-targets.txt") targets_kai <-
> > readTargets("kaiser-targets.txt") ab_hsu <-
> > ReadAffy(filenames=targets_hsu$FileName)
> > ab_kai <- ReadAffy(filenames=targets_kai$FileName)
> > eset_hsu <- gcrma(ab_hsu)
> > eset_kai <- gcrma(ab_kai)
>
> So far so good. Now I merge the esets with inSilicoMerging:
>
> library(inSilicoMerging)
> > eset <- merge(list(eset_hsu,eset_kai),method="COMBAT")
> INSILICOMERGING: Run COMBAT...
> INSILICOMERGING: => Found 2 batches
> INSILICOMERGING: => Found 0 covariate(s)
> > dim(eset_hsu)
> Features Samples
> 22277 26
> > dim(eset_kai)
> Features Samples
> 54675 10
> > dim(eset)
> Features Samples
> 22277 36
>
> This looks like it worked. I used plotMDS(), and the data are
> nicely intermixed as one would hope. Now I need to do DE
> analysis with Limma. Hsu (the first 26 samples) are experimental
> and Kai (the last 10 samples) are control. So I create a design
> matrix like this:
>
>
> > design <- cbind(CTL=1, EXPvsCTL=c(rep(1,26),rep(0,10))) fit <-
> > lmFit(eset, design) fit <- eBayes(fit) tt<-topTable(fit,
> > coef="EXPvsCTL",number=100000)
> > head(tt,n=3)
> logFC AveExpr t P.Value adj.P.Val B
> 1007_s_at 1.487227e-15 7.045851 4.223110e-15 1 1 -6.235399
> 1053_at 7.281216e-16 5.498793 1.127582e-14 1 1 -6.235399
> 117_at -8.262373e-16 5.047563 -2.068570e-15 1 1 -6.235399
> > tail(tt,n=3)
> logFC AveExpr t P.Value adj.P.Val
> AFFX-TrpnX-3_at -3.953675e-16 2.257998 -1.507504e-13 1 1
> AFFX-TrpnX-5_at 1.024821e-16 2.257637 4.062598e-14 1 1
> AFFX-TrpnX-M_at 1.024821e-16 2.257637 4.062598e-14 1 1
> B
> AFFX-TrpnX-3_at -6.235399
> AFFX-TrpnX-5_at -6.235399
> AFFX-TrpnX-M_at -6.235399
>
>
> As you can see, the p values and B statistics are the same for every
> probe. Clearly something is wrong here. Did I do something wrong?
> Is this sort of thing expected when you merge datasets like this? Any
> nudges in the right direction would be appreciated.
> -Ed
>
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
________________________________
AMC Disclaimer : http://www.amc.nl/disclaimer
More information about the Bioconductor
mailing list