[BioC] doing paired t-test amongst several groups

Fri Feb 16 14:17:43 CET 2007

Hello James,

Thanks a lot for your insight! I made that contrast matrix as you 
suggested and got the results fine.

Can I add another question?

You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), the 
baseline is OB1/SurfaceA and everything is compared to that...

So are the results under coefficients SurfaceB and SurfaceC taking into 
consideration all patients? or just OB1? (I imagine is all patients, but 
just checking how it works)

Thanks again!

Milena

James W. MacDonald wrote:
> Hi Milena,
>
> Milena Gongora wrote:
>   
>> Hello Everyone,
>>
>> I am wondering if anyone has scaled a paired t-test to do multiple 
>> pairwise comparisons and can enlighten me in how to interpret the 
>> outcome. I read the limma guide back and forth but seem to be missing on 
>> understanding a few things.
>>
>> Essentially I am doing a paired t-test, but have 3 treatments and wish 
>> to make pairwise comparisons of all combinations.
>>
>> I have single channel data (Illumina) that I imported using 
>> BeadExplorer, this creates an exprSet. Following that I RMA-bg-corrected 
>> and then normalized using Quantile normalization from the BeadExplorer 
>> package, which essentially invokes limma Quantile normalization. As a 
>> result of this I had an exprSet of normalized values which I then log2 
>> transformed.
>>
>> So my experimental design is as follows, 5 patients that were biopsied 
>> (OB1 to OB5) and their biopsy split into 3 cultures of cells that 
>> underwent each a different treatment (surfaces A, B, C). Therefore I 
>> have 3 treatments, each with 5 replicates but they are of the same 
>> origin, which to my logic seems like I should analyse as paired samples.
>>
>> My challenge was to scale the paired t-test to 3 sets of comparisons.
>>
>> So first I read a targets file that specifies all the pairs and treatments
>>
>>  > targets <- readTargets("samples.txt")
>>  > targets
>>        FileName Patient Surface
>> 1  1519138023_A     OB1     A
>> 2  1488802050_A     OB1     B
>> 3  1488802050_D     OB1     C
>> 4  1519138023_B     OB2     A
>> 5  1488802050_B     OB2     B
>> 6  1488802050_E     OB2     C
>> 7  1519138023_C     OB3     A
>> 8  1488802050_C     OB3     B
>> 9  1488802050_F     OB3     C
>> 10 1519138023_D     OB4     A
>> 11 1519138023_E     OB4     B
>> 12 1519138023_F     OB4     C
>> 13 1519138034_A     OB5     A
>> 14 1519138034_B     OB5     B
>> 15 1519138034_C     OB5     C
>>
>> Then make the design matrix
>>  > Patients <- factor(targets$Patient)
>>  > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>>  > paired_design <- model.matrix(~Patients+Surfaces)
>>
>> And then fit a linear model and do eBayes
>>  > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design)
>>  > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>>
>>  > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>>                  ID X.Intercept.   PatientsOB2 PatientsOB3 PatientsOB4
>> 13720 GI_34304116-S     15.29244  1.431159e-15   0.1152188  0.14177094
>> 11757 GI_31543813-S     15.14338 -1.090994e-01   0.1038085  0.08840763
>>
>>       PatientsOB5 SurfacesSLA SurfacesSLAa  AveExpr        F
>> 13720 -0.03689951 0.006326441   0.01046853 15.34205 30967.96
>> 11757  0.01106040 0.080210742  -0.06714165 15.16657 29657.53
>>
>>            P.Value    adj.P.Val
>> 13720 1.549603e-24 1.816823e-20
>> 11757 2.007728e-24 1.816823e-20
>>  
>>
>> My Questions are:
>> I am a bit confused by the fact that in the resulting table (shown by 
>> topTable) I am getting a column for the intercept of surface A with all 
>> patients as well as other surfaces. What do the values under patients 
>> mean? Does the fact that they are being considered reduces the power of 
>> the comparison to the other surfaces?
>>     
>
> For starters, you _have_ fit a paired design, and it is simple to get 
> your results out. Unfortunately it is difficult to explain this via 
> email (and if you were taking a linear modeling class there would 
> probably be several lectures devoted to design matrices, so it isn't a 
> trivial thing to learn).
>
> In short, the model you are fitting uses patient OB1/Surface A as a 
> baseline, to which all other samples are compared (looking at the design 
> matrix may help). The SurfacesB coefficient compares the B and A 
> surfaces (B-A), and the SurfacesC coefficient compares the C and A 
> surfaces (C-A). If you want the other comparison, you need to set up a 
> contrasts matrix like this:
>
>  > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C"))
>              B-C
> (Intercept)   0
> PatientsOB2   0
> PatientsOB3   0
> PatientsOB4   0
> PatientsOB5   0
> SurfacesB     1
> SurfacesC    -1
>
> Because this will compute (B-A) - (C-A) = B-C.
>
> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes 
> different between A and B, coef=7 will give you the genes different 
> between A and C, and fitting the contrast will give you the genes 
> different between B and C.
>
> Best,
>
> Jim
>
>
>
>
>   
>> As I am not interested in the differential expression amongst patients, 
>> how do I avoid these being considered?
>>
>> How can I know about the differences amongst surfaces B and C?
>>
>> Do I need to or can I make a contrast matrix to specify which are the 
>> comparisons I want to get information for? (only surfaces, and not 
>> amongst patients)
>>
>> If I can make a contrast matrix, can you give me an example of how to do 
>> it with 3 treatments?
>>
>> Many Thanks!
>>
>> Milena
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>
>
>   

-- 
Milena Gongora
Bioinformatician
SRC Computational Chemistry and Biology Unit
Institute for Molecular Biosciences
The University of Queensland

Phone: +61 7 3346 2609
Fax: +61 7 3346 2101
email: m.gongora at imb.uq.edu.au