[BioC] doing paired t-test amongst several groups

Mon Feb 12 18:18:23 CET 2007

Hi Milena,

Milena Gongora wrote:
> Hello Everyone,
> 
> I am wondering if anyone has scaled a paired t-test to do multiple 
> pairwise comparisons and can enlighten me in how to interpret the 
> outcome. I read the limma guide back and forth but seem to be missing on 
> understanding a few things.
> 
> Essentially I am doing a paired t-test, but have 3 treatments and wish 
> to make pairwise comparisons of all combinations.
> 
> I have single channel data (Illumina) that I imported using 
> BeadExplorer, this creates an exprSet. Following that I RMA-bg-corrected 
> and then normalized using Quantile normalization from the BeadExplorer 
> package, which essentially invokes limma Quantile normalization. As a 
> result of this I had an exprSet of normalized values which I then log2 
> transformed.
> 
> So my experimental design is as follows, 5 patients that were biopsied 
> (OB1 to OB5) and their biopsy split into 3 cultures of cells that 
> underwent each a different treatment (surfaces A, B, C). Therefore I 
> have 3 treatments, each with 5 replicates but they are of the same 
> origin, which to my logic seems like I should analyse as paired samples.
> 
> My challenge was to scale the paired t-test to 3 sets of comparisons.
> 
> So first I read a targets file that specifies all the pairs and treatments
> 
>  > targets <- readTargets("samples.txt")
>  > targets
>        FileName Patient Surface
> 1  1519138023_A     OB1     A
> 2  1488802050_A     OB1     B
> 3  1488802050_D     OB1     C
> 4  1519138023_B     OB2     A
> 5  1488802050_B     OB2     B
> 6  1488802050_E     OB2     C
> 7  1519138023_C     OB3     A
> 8  1488802050_C     OB3     B
> 9  1488802050_F     OB3     C
> 10 1519138023_D     OB4     A
> 11 1519138023_E     OB4     B
> 12 1519138023_F     OB4     C
> 13 1519138034_A     OB5     A
> 14 1519138034_B     OB5     B
> 15 1519138034_C     OB5     C
> 
> Then make the design matrix
>  > Patients <- factor(targets$Patient)
>  > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>  > paired_design <- model.matrix(~Patients+Surfaces)
> 
> And then fit a linear model and do eBayes
>  > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design)
>  > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
> 
>  > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>                  ID X.Intercept.   PatientsOB2 PatientsOB3 PatientsOB4
> 13720 GI_34304116-S     15.29244  1.431159e-15   0.1152188  0.14177094
> 11757 GI_31543813-S     15.14338 -1.090994e-01   0.1038085  0.08840763
> 
>       PatientsOB5 SurfacesSLA SurfacesSLAa  AveExpr        F
> 13720 -0.03689951 0.006326441   0.01046853 15.34205 30967.96
> 11757  0.01106040 0.080210742  -0.06714165 15.16657 29657.53
> 
>            P.Value    adj.P.Val
> 13720 1.549603e-24 1.816823e-20
> 11757 2.007728e-24 1.816823e-20
>  
> 
> My Questions are:
> I am a bit confused by the fact that in the resulting table (shown by 
> topTable) I am getting a column for the intercept of surface A with all 
> patients as well as other surfaces. What do the values under patients 
> mean? Does the fact that they are being considered reduces the power of 
> the comparison to the other surfaces?

For starters, you _have_ fit a paired design, and it is simple to get 
your results out. Unfortunately it is difficult to explain this via 
email (and if you were taking a linear modeling class there would 
probably be several lectures devoted to design matrices, so it isn't a 
trivial thing to learn).

In short, the model you are fitting uses patient OB1/Surface A as a 
baseline, to which all other samples are compared (looking at the design 
matrix may help). The SurfacesB coefficient compares the B and A 
surfaces (B-A), and the SurfacesC coefficient compares the C and A 
surfaces (C-A). If you want the other comparison, you need to set up a 
contrasts matrix like this:

 > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C"))
             B-C
(Intercept)   0
PatientsOB2   0
PatientsOB3   0
PatientsOB4   0
PatientsOB5   0
SurfacesB     1
SurfacesC    -1

Because this will compute (B-A) - (C-A) = B-C.

So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes 
different between A and B, coef=7 will give you the genes different 
between A and C, and fitting the contrast will give you the genes 
different between B and C.

Best,

Jim

> 
> As I am not interested in the differential expression amongst patients, 
> how do I avoid these being considered?
> 
> How can I know about the differences amongst surfaces B and C?
> 
> Do I need to or can I make a contrast matrix to specify which are the 
> comparisons I want to get information for? (only surfaces, and not 
> amongst patients)
> 
> If I can make a contrast matrix, can you give me an example of how to do 
> it with 3 treatments?
> 
> Many Thanks!
> 
> Milena
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.