[BioC] Limma, blocking, robust methods, and paired samples question
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Nov 28 04:21:20 CET 2013
Dear Gustavo,
Your email raises a lot of different issues, so I will try to answer a few
things in turn.
First, you appear to have paired samples, so you should analyse the data
as a paired design. There is no further information to be extracted by
using random effects for a paired design.
You have tried to combine a random effect with method="robust" but limma
does not support this. If you use method="robust", then the correlation
will be ignored and treated as zero.
You say that the consensus correlation is 1, but your output actually
shows it to be 0.72, and as mentioned above it has been treated as zero in
the analysis.
I suggest that you stick to the usual paired analysis. As an alternative
to lmFit(...,method="robust"), you could also try
eBayes(robust=TRUE,trend=TRUE). I also suggest that you make plots of
your data to see what is going on rather than just trying out different
analyses to see what they do. For example
plotSA(fit)
is likely to be useful.
Best wishes
Gordon
> Date: Tue, 26 Nov 2013 18:47:53 +0100
> From: Gustavo Fernandez Bayon <gbayon at gmail.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] Limma, blocking, robust methods, and paired samples
> question
>
> Hello everybody.
>
> I am experiencing some problems with a methylation microarray experiment I
> am trying to analyze and, although I am able to get results to some extent,
> I am suspicious about them and would like to know if I am doing things
> correctly.
>
> We have 12 samples. Each sample is built on DNA coming from a single cell.
> The samples can be of two types: FLO- or FLO+. We have two samples, one
> FLO- and one FLO+, for every type of primary tumor we are working on. So
> the experiment could be described as follows:
>
> Tumor_Name Sample_Group
> <character> <character>
> 9296931014_R01C01 185_8-6-11 FLO-
> 9296931014_R02C01 185_8-6-11 FLO+
> 9296931014_R03C01 185_SCD1S FLO-
> 9296931014_R04C01 185_SCD1S FLO+
> 9296931014_R05C01 185_SCD2S FLO-
> ... ... ...
> 9296931014_R02C02 185_4/12 FLO+
> 9296931014_R03C02 A6L_8-1-10 FLO-
> 9296931014_R04C02 A6L_8-1-10 FLO+
> 9296931014_R05C02 A6L_23-2-10 FLO-
> 9296931014_R06C02 A6L_23-2-10 FLO+
>
> We want to find the differentially methylated probes between FLO- and FLO+.
> If we try to fit a simple model (~ Sample_Group), we are not able to get
> any useful result. I thought that this could be caused by the among-tumor
> differences, so I decided to model the experiment as a paired samples test,
> and did the following:
>
> design <- model.matrix(~ Sample_Group + Tumor_Name, data=pdata)
> fit <- lmFit(mvalues, design)
> efit <- eBayes(fit)
> topTable(efit, coef=2)
>
> Unfortunately, no probe had an adjusted p-value under our significance
> level. So I kept searching for methods to overcome this.
>
> Lately, we have been playing around with robust methods, because we think
> they could suit well to microarray problems, where outliers (specially in
> those probes with SNPs nearby) could account for much variability. So I
> decided to do the following:
>
> design <- model.matrix(~ Sample_Group + Tumor_Name, data=pdata)
> fit <- lmFit(mvalues, design, method='robust')
> efit <- eBayes(fit)
> topTable(efit, coef=2)
>
> logFC AveExpr t P.Value
> adj.P.Val B
> cg03143457 1.1528608 -2.521458 20.10818 3.293139e-08 0.01415556
> 7.525073
> ch.4.162336241R 1.1927542 -3.956192 16.62275 1.491918e-07 0.03076082
> 6.622935
> cg23320987 -0.7800961 3.997052 -14.76658 3.791254e-07 0.03076082
> 6.201241
> ...
>
> In this case, I am able to get more than 9000 significant probes. This
> could suffice, but I started reading discussions in this list about limma
> being able to model random effects, and decided to see what could happen if
> I modeled the origin tumor as a random effect:
>
> design <- model.matrix(~ Sample_Group, data=pdata)
> cor <- duplicateCorrelation(mvalues, design, block=pdata$Tumor_Name)
>
>> cor$consensus.correlation
> [1] 0.7208955
>
> fit <- lmFit(mvalues, design, block=pdata$Tumor_Name,
> correlation=cor$consensus.correlation, method='robust')
>
> There were 50 or more warnings (use warnings() to see the first 50)
>> warnings()
> Warning messages:
> 1: In rlm.default(x = X, y = y, weights = w, ...) :
> 'rlm' failed to converge in 20 steps
> 2: In rlm.default(x = X, y = y, weights = w, ...) :
> ...
>
> efit <- eBayes(fit)
> topTable(efit, coef=2)
>
>> topTable(efit, coef=2)
> logFC AveExpr t P.Value
> adj.P.Val B
> cg13065299 0.6832614 4.468798 6.702811 0.000023155532 0.9999998
> -3.510208
> cg04211927 0.7980711 3.845420 6.906811 0.000017357590 0.9999998
> -3.525569
> ch.22.436090R 0.5894315 -3.633235 6.305634 0.000041228561 0.9999998
> -3.526679
> ...
>
> I was expecting a different number of probes, accounting for the increased
> power (as I have read) of a mixed model in this case, but I actually got 0
> significant probes. Another strange thing is that consensus correlation
> equals 1.
>
> Is it possible to treat this experiment as a paired-sample test? I am
> always doubtful when thinking about experimental designs.
>
> Have I done the call to duplicateCorrelation() in a correct way? I am not
> sure if I have to include the blocking variable in the design matrix or not.
>
> Is it possible to fit a mixed model using the robust method? Maybe I am
> trying to do too many things at once.
>
> Could I trust the ~9000 probes I got from the simple, paired design?
>
> Thank you very much. As always, any hint or help would be much appreciated.
>
> Regards,
> Gustavo
>
> PS: My sessionInfo:
>
> [[alternative HTML version deleted]]
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list