[BioC] doing paired t-test amongst several groups
Milena Gongora
m.gongora at imb.uq.edu.au
Fri Feb 16 14:17:43 CET 2007
Hello James,
Thanks a lot for your insight! I made that contrast matrix as you
suggested and got the results fine.
Can I add another question?
You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), the
baseline is OB1/SurfaceA and everything is compared to that...
So are the results under coefficients SurfaceB and SurfaceC taking into
consideration all patients? or just OB1? (I imagine is all patients, but
just checking how it works)
Thanks again!
Milena
James W. MacDonald wrote:
> Hi Milena,
>
> Milena Gongora wrote:
>
>> Hello Everyone,
>>
>> I am wondering if anyone has scaled a paired t-test to do multiple
>> pairwise comparisons and can enlighten me in how to interpret the
>> outcome. I read the limma guide back and forth but seem to be missing on
>> understanding a few things.
>>
>> Essentially I am doing a paired t-test, but have 3 treatments and wish
>> to make pairwise comparisons of all combinations.
>>
>> I have single channel data (Illumina) that I imported using
>> BeadExplorer, this creates an exprSet. Following that I RMA-bg-corrected
>> and then normalized using Quantile normalization from the BeadExplorer
>> package, which essentially invokes limma Quantile normalization. As a
>> result of this I had an exprSet of normalized values which I then log2
>> transformed.
>>
>> So my experimental design is as follows, 5 patients that were biopsied
>> (OB1 to OB5) and their biopsy split into 3 cultures of cells that
>> underwent each a different treatment (surfaces A, B, C). Therefore I
>> have 3 treatments, each with 5 replicates but they are of the same
>> origin, which to my logic seems like I should analyse as paired samples.
>>
>> My challenge was to scale the paired t-test to 3 sets of comparisons.
>>
>> So first I read a targets file that specifies all the pairs and treatments
>>
>> > targets <- readTargets("samples.txt")
>> > targets
>> FileName Patient Surface
>> 1 1519138023_A OB1 A
>> 2 1488802050_A OB1 B
>> 3 1488802050_D OB1 C
>> 4 1519138023_B OB2 A
>> 5 1488802050_B OB2 B
>> 6 1488802050_E OB2 C
>> 7 1519138023_C OB3 A
>> 8 1488802050_C OB3 B
>> 9 1488802050_F OB3 C
>> 10 1519138023_D OB4 A
>> 11 1519138023_E OB4 B
>> 12 1519138023_F OB4 C
>> 13 1519138034_A OB5 A
>> 14 1519138034_B OB5 B
>> 15 1519138034_C OB5 C
>>
>> Then make the design matrix
>> > Patients <- factor(targets$Patient)
>> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>> > paired_design <- model.matrix(~Patients+Surfaces)
>>
>> And then fit a linear model and do eBayes
>> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design)
>> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>>
>> > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4
>> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094
>> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763
>>
>> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F
>> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96
>> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53
>>
>> P.Value adj.P.Val
>> 13720 1.549603e-24 1.816823e-20
>> 11757 2.007728e-24 1.816823e-20
>>
>>
>> My Questions are:
>> I am a bit confused by the fact that in the resulting table (shown by
>> topTable) I am getting a column for the intercept of surface A with all
>> patients as well as other surfaces. What do the values under patients
>> mean? Does the fact that they are being considered reduces the power of
>> the comparison to the other surfaces?
>>
>
> For starters, you _have_ fit a paired design, and it is simple to get
> your results out. Unfortunately it is difficult to explain this via
> email (and if you were taking a linear modeling class there would
> probably be several lectures devoted to design matrices, so it isn't a
> trivial thing to learn).
>
> In short, the model you are fitting uses patient OB1/Surface A as a
> baseline, to which all other samples are compared (looking at the design
> matrix may help). The SurfacesB coefficient compares the B and A
> surfaces (B-A), and the SurfacesC coefficient compares the C and A
> surfaces (C-A). If you want the other comparison, you need to set up a
> contrasts matrix like this:
>
> > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C"))
> B-C
> (Intercept) 0
> PatientsOB2 0
> PatientsOB3 0
> PatientsOB4 0
> PatientsOB5 0
> SurfacesB 1
> SurfacesC -1
>
> Because this will compute (B-A) - (C-A) = B-C.
>
> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes
> different between A and B, coef=7 will give you the genes different
> between A and C, and fitting the contrast will give you the genes
> different between B and C.
>
> Best,
>
> Jim
>
>
>
>
>
>> As I am not interested in the differential expression amongst patients,
>> how do I avoid these being considered?
>>
>> How can I know about the differences amongst surfaces B and C?
>>
>> Do I need to or can I make a contrast matrix to specify which are the
>> comparisons I want to get information for? (only surfaces, and not
>> amongst patients)
>>
>> If I can make a contrast matrix, can you give me an example of how to do
>> it with 3 treatments?
>>
>> Many Thanks!
>>
>> Milena
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
--
Milena Gongora
Bioinformatician
SRC Computational Chemistry and Biology Unit
Institute for Molecular Biosciences
The University of Queensland
Phone: +61 7 3346 2609
Fax: +61 7 3346 2101
email: m.gongora at imb.uq.edu.au
More information about the Bioconductor
mailing list