[BioC] inSilicoMerging and Limma

Fri Jun 7 21:10:26 CEST 2013

Dear Ed,

This behaviour is indeed expected when you merge datasets as you did. The two datasets are completely confounded with your contrast of interest: Hsu corresponds to the EXP condition and Kai to the CTL condition. Using ComBat to merge the two datasets, you actually took care that the mean expression of each gene is equal between the two studies. This explains the log fold changes that are nearly zero in your top table. Because of the confounding, there is no sensible way of merging these datasets.

best,
Perry

---
Perry Moerland, PhD
Room J1B-215
Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics
Academic Medical Centre, University of Amsterdam
Postbus 22660, 1100 DD Amsterdam, The Netherlands
tel: +31 20 5666945
p.d.moerland at amc.uva.nl, http://www.bioinformaticslaboratory.nl/

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Ed Siefker
Sent: Friday, June 07, 2013 8:52 PM
To: bioconductor
Subject: Re: [BioC] inSilicoMerging and Limma

I think the problem is with lmFit().  The merged eset definitely contain different data for each samples, since the MDS plot shows data points scattered around the origin.  But after feeding this eset to lmFit(), here's what I get:

*****************************************************
> fit
An object of class "MArrayLM"
$coefficients
               CTL      EXPvsCTL
1007_s_at 7.045851  1.487227e-15
1053_at   5.498793  7.281216e-16
117_at    5.047563 -8.262373e-16
121_at    4.350444  4.131187e-16
1255_g_at 2.257968 -2.780886e-16
22272 more rows ...

$rank
[1] 2

$assign
NULL

$qr
$qr
            CTL    EXPvsCTL
[1,] -6.0000000 -4.33333333
[2,]  0.1666667 -2.68741925
[3,]  0.1666667  0.08859624
[4,]  0.1666667  0.08859624
[5,]  0.1666667  0.08859624
31 more rows ...

$qraux
[1] 1.166667 1.088596

$pivot
[1] 1 2

$tol
[1] 1e-07

$rank
[1] 2

$df.residual
[1] 34 34 34 34 34
22272 more elements ...

$sigma
  1007_s_at     1053_at      117_at      121_at   1255_g_at
0.954188915 0.174833541 1.082247491 0.346694394 0.001783684
22272 more elements ...

$cov.coefficients
          CTL   EXPvsCTL
CTL       0.1 -0.1000000
EXPvsCTL -0.1  0.1384615

$stdev.unscaled
                CTL  EXPvsCTL
1007_s_at 0.3162278 0.3721042
1053_at   0.3162278 0.3721042
117_at    0.3162278 0.3721042
121_at    0.3162278 0.3721042
1255_g_at 0.3162278 0.3721042
22272 more rows ...

$pivot
[1] 1 2

$Amean
1007_s_at   1053_at    117_at    121_at 1255_g_at
 7.045851  5.498793  5.047563  4.350444  2.257968
22272 more elements ...

$method
[1] "ls"

$design
     CTL EXPvsCTL
[1,]   1        1
[2,]   1        1
[3,]   1        1
[4,]   1        1
[5,]   1        1
31 more rows ...

*****************************************************

For some reason fit$stdev.unscaled gives the same values for every probe.

On Fri, Jun 7, 2013 at 12:52 PM, Ed Siefker <ebs15242 at gmail.com> wrote:

> I am trying to compare find differentially expressed genes between
> appendix and colon tumor samples, which have been arrayed on different
> platforms.  Namely hgu133a2 and hgu133plus2.
> hgu133a2 is a subset of hgu133plus2, and Bioconductor provides a
> package, inSilicoMerging, that's supposed to do this, so I thought it
> would be straight forward.
>
> First read in my CEL files and normalize them:
>
> > targets_hsu <- readTargets("Hsu-targets.txt") targets_kai <-
> > readTargets("kaiser-targets.txt") ab_hsu <-
> > ReadAffy(filenames=targets_hsu$FileName)
> > ab_kai <- ReadAffy(filenames=targets_kai$FileName)
> > eset_hsu <- gcrma(ab_hsu)
> > eset_kai <- gcrma(ab_kai)
>
> So far so good.  Now I merge the esets with inSilicoMerging:
>
> library(inSilicoMerging)
> > eset <- merge(list(eset_hsu,eset_kai),method="COMBAT")
>   INSILICOMERGING: Run COMBAT...
>   INSILICOMERGING:   => Found 2 batches
>   INSILICOMERGING:   => Found 0 covariate(s)
> > dim(eset_hsu)
> Features  Samples
>    22277       26
> > dim(eset_kai)
> Features  Samples
>    54675       10
> > dim(eset)
> Features  Samples
>    22277       36
>
> This looks like it worked.  I used plotMDS(), and the data are
> nicely intermixed as one would hope.   Now I need to do DE
> analysis with Limma. Hsu (the first 26 samples) are experimental
> and Kai (the last 10 samples) are control.   So I create a design
> matrix like this:
>
>
> > design <- cbind(CTL=1, EXPvsCTL=c(rep(1,26),rep(0,10))) fit <-
> > lmFit(eset, design) fit <- eBayes(fit) tt<-topTable(fit,
> > coef="EXPvsCTL",number=100000)
> > head(tt,n=3)
>                   logFC  AveExpr             t P.Value adj.P.Val         B
> 1007_s_at  1.487227e-15 7.045851  4.223110e-15       1         1 -6.235399
> 1053_at    7.281216e-16 5.498793  1.127582e-14       1         1 -6.235399
> 117_at    -8.262373e-16 5.047563 -2.068570e-15       1         1 -6.235399
> > tail(tt,n=3)
>                         logFC  AveExpr             t P.Value adj.P.Val
> AFFX-TrpnX-3_at -3.953675e-16 2.257998 -1.507504e-13       1         1
> AFFX-TrpnX-5_at  1.024821e-16 2.257637  4.062598e-14       1         1
> AFFX-TrpnX-M_at  1.024821e-16 2.257637  4.062598e-14       1         1
>                         B
> AFFX-TrpnX-3_at -6.235399
> AFFX-TrpnX-5_at -6.235399
> AFFX-TrpnX-M_at -6.235399
>
>
> As you can see, the p values and B statistics are the same for every
> probe.  Clearly something is wrong here.  Did I do something wrong?
> Is this sort of thing expected when you merge datasets like this?  Any
> nudges in the right direction would be appreciated.
> -Ed
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
________________________________

AMC Disclaimer : http://www.amc.nl/disclaimer