[BioC] doing paired t-test amongst several groups
James W. MacDonald
jmacdon at med.umich.edu
Mon Feb 12 18:18:23 CET 2007
Hi Milena,
Milena Gongora wrote:
> Hello Everyone,
>
> I am wondering if anyone has scaled a paired t-test to do multiple
> pairwise comparisons and can enlighten me in how to interpret the
> outcome. I read the limma guide back and forth but seem to be missing on
> understanding a few things.
>
> Essentially I am doing a paired t-test, but have 3 treatments and wish
> to make pairwise comparisons of all combinations.
>
> I have single channel data (Illumina) that I imported using
> BeadExplorer, this creates an exprSet. Following that I RMA-bg-corrected
> and then normalized using Quantile normalization from the BeadExplorer
> package, which essentially invokes limma Quantile normalization. As a
> result of this I had an exprSet of normalized values which I then log2
> transformed.
>
> So my experimental design is as follows, 5 patients that were biopsied
> (OB1 to OB5) and their biopsy split into 3 cultures of cells that
> underwent each a different treatment (surfaces A, B, C). Therefore I
> have 3 treatments, each with 5 replicates but they are of the same
> origin, which to my logic seems like I should analyse as paired samples.
>
> My challenge was to scale the paired t-test to 3 sets of comparisons.
>
> So first I read a targets file that specifies all the pairs and treatments
>
> > targets <- readTargets("samples.txt")
> > targets
> FileName Patient Surface
> 1 1519138023_A OB1 A
> 2 1488802050_A OB1 B
> 3 1488802050_D OB1 C
> 4 1519138023_B OB2 A
> 5 1488802050_B OB2 B
> 6 1488802050_E OB2 C
> 7 1519138023_C OB3 A
> 8 1488802050_C OB3 B
> 9 1488802050_F OB3 C
> 10 1519138023_D OB4 A
> 11 1519138023_E OB4 B
> 12 1519138023_F OB4 C
> 13 1519138034_A OB5 A
> 14 1519138034_B OB5 B
> 15 1519138034_C OB5 C
>
> Then make the design matrix
> > Patients <- factor(targets$Patient)
> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
> > paired_design <- model.matrix(~Patients+Surfaces)
>
> And then fit a linear model and do eBayes
> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design)
> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>
> > topTable(fit2_paired_RMAbg_Qnorm, number=2)
> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4
> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094
> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763
>
> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F
> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96
> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53
>
> P.Value adj.P.Val
> 13720 1.549603e-24 1.816823e-20
> 11757 2.007728e-24 1.816823e-20
>
>
> My Questions are:
> I am a bit confused by the fact that in the resulting table (shown by
> topTable) I am getting a column for the intercept of surface A with all
> patients as well as other surfaces. What do the values under patients
> mean? Does the fact that they are being considered reduces the power of
> the comparison to the other surfaces?
For starters, you _have_ fit a paired design, and it is simple to get
your results out. Unfortunately it is difficult to explain this via
email (and if you were taking a linear modeling class there would
probably be several lectures devoted to design matrices, so it isn't a
trivial thing to learn).
In short, the model you are fitting uses patient OB1/Surface A as a
baseline, to which all other samples are compared (looking at the design
matrix may help). The SurfacesB coefficient compares the B and A
surfaces (B-A), and the SurfacesC coefficient compares the C and A
surfaces (C-A). If you want the other comparison, you need to set up a
contrasts matrix like this:
> matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C"))
B-C
(Intercept) 0
PatientsOB2 0
PatientsOB3 0
PatientsOB4 0
PatientsOB5 0
SurfacesB 1
SurfacesC -1
Because this will compute (B-A) - (C-A) = B-C.
So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes
different between A and B, coef=7 will give you the genes different
between A and C, and fitting the contrast will give you the genes
different between B and C.
Best,
Jim
>
> As I am not interested in the differential expression amongst patients,
> how do I avoid these being considered?
>
> How can I know about the differences amongst surfaces B and C?
>
> Do I need to or can I make a contrast matrix to specify which are the
> comparisons I want to get information for? (only surfaces, and not
> amongst patients)
>
> If I can make a contrast matrix, can you give me an example of how to do
> it with 3 treatments?
>
> Many Thanks!
>
> Milena
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list