[BioC] doing paired t-test amongst several groups
Milena Gongora
m.gongora at imb.uq.edu.au
Sat Feb 17 14:14:54 CET 2007
Hi James,
Great, I am convinced. Thanks for the explanation, although my brain
hurts following it....
Cheers,
Milena
James W. MacDonald wrote:
> Hi Milena,
>
> Milena Gongora wrote:
>> Hello James,
>>
>> Thanks a lot for your insight! I made that contrast matrix as you
>> suggested and got the results fine.
>>
>> Can I add another question?
>>
>> You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm),
>> the baseline is OB1/SurfaceA and everything is compared to that...
>>
>> So are the results under coefficients SurfaceB and SurfaceC taking
>> into consideration all patients? or just OB1? (I imagine is all
>> patients, but just checking how it works)
>
> Yep. All patients. You can convince yourself of this by noting that
> the design matrix has all of these rows:
>
> Int OB2 OB3 OB4 OB5 B C
> 2 1 0 0 0 0 1 0
> 5 1 1 0 0 0 1 0
> 8 1 0 1 0 0 1 0
> 11 1 0 0 1 0 1 0
> 14 1 0 0 0 1 1 0
>
> A simple way to think about this is to go back to first principles.
> The design matrix is used to do the linear algebra required to
> estimate these coefficients. Linear algebra is just a compact way to
> solve for multiple unknowns (remember in algebra how you can solve for
> x with one equation, but to solve for x and y you need two, and to
> solve for x, y, and z you need three, and on and on?).
>
> So the above design matrix specifies the following equations:
>
> <val1> = (OB1 + A) + (B - A)
> <val2> = (OB1 + A) + (OB2 - OB1) + (B - A)
> <val3> = (OB1 + A) + (OB3 - OB1) + (B - A)
> .
> .
> .
>
> Which we solve simultaneously to get the coefficients, based on all
> the patients.
>
> Does that help?
>
> Best,
>
> Jim
>
>
>
>>
>> Thanks again!
>>
>> Milena
>>
>> James W. MacDonald wrote:
>>
>>> Hi Milena,
>>>
>>> Milena Gongora wrote:
>>>
>>>
>>>> Hello Everyone,
>>>>
>>>> I am wondering if anyone has scaled a paired t-test to do multiple
>>>> pairwise comparisons and can enlighten me in how to interpret the
>>>> outcome. I read the limma guide back and forth but seem to be
>>>> missing on understanding a few things.
>>>>
>>>> Essentially I am doing a paired t-test, but have 3 treatments and
>>>> wish to make pairwise comparisons of all combinations.
>>>>
>>>> I have single channel data (Illumina) that I imported using
>>>> BeadExplorer, this creates an exprSet. Following that I
>>>> RMA-bg-corrected and then normalized using Quantile normalization
>>>> from the BeadExplorer package, which essentially invokes limma
>>>> Quantile normalization. As a result of this I had an exprSet of
>>>> normalized values which I then log2 transformed.
>>>>
>>>> So my experimental design is as follows, 5 patients that were
>>>> biopsied (OB1 to OB5) and their biopsy split into 3 cultures of
>>>> cells that underwent each a different treatment (surfaces A, B, C).
>>>> Therefore I have 3 treatments, each with 5 replicates but they are
>>>> of the same origin, which to my logic seems like I should analyse
>>>> as paired samples.
>>>>
>>>> My challenge was to scale the paired t-test to 3 sets of comparisons.
>>>>
>>>> So first I read a targets file that specifies all the pairs and
>>>> treatments
>>>>
>>>> > targets <- readTargets("samples.txt")
>>>> > targets
>>>> FileName Patient Surface
>>>> 1 1519138023_A OB1 A
>>>> 2 1488802050_A OB1 B
>>>> 3 1488802050_D OB1 C
>>>> 4 1519138023_B OB2 A
>>>> 5 1488802050_B OB2 B
>>>> 6 1488802050_E OB2 C
>>>> 7 1519138023_C OB3 A
>>>> 8 1488802050_C OB3 B
>>>> 9 1488802050_F OB3 C
>>>> 10 1519138023_D OB4 A
>>>> 11 1519138023_E OB4 B
>>>> 12 1519138023_F OB4 C
>>>> 13 1519138034_A OB5 A
>>>> 14 1519138034_B OB5 B
>>>> 15 1519138034_C OB5 C
>>>>
>>>> Then make the design matrix
>>>> > Patients <- factor(targets$Patient)
>>>> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>>>> > paired_design <- model.matrix(~Patients+Surfaces)
>>>>
>>>> And then fit a linear model and do eBayes
>>>> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm,
>>>> paired_design)
>>>> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>>>>
>>>> > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>>>> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4
>>>> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094
>>>> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763
>>>>
>>>> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F
>>>> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96
>>>> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53
>>>>
>>>> P.Value adj.P.Val
>>>> 13720 1.549603e-24 1.816823e-20
>>>> 11757 2.007728e-24 1.816823e-20
>>>>
>>>>
>>>> My Questions are:
>>>> I am a bit confused by the fact that in the resulting table (shown
>>>> by topTable) I am getting a column for the intercept of surface A
>>>> with all patients as well as other surfaces. What do the values
>>>> under patients mean? Does the fact that they are being considered
>>>> reduces the power of the comparison to the other surfaces?
>>>>
>>>
>>>
>>> For starters, you _have_ fit a paired design, and it is simple to
>>> get your results out. Unfortunately it is difficult to explain this
>>> via email (and if you were taking a linear modeling class there
>>> would probably be several lectures devoted to design matrices, so it
>>> isn't a trivial thing to learn).
>>>
>>> In short, the model you are fitting uses patient OB1/Surface A as a
>>> baseline, to which all other samples are compared (looking at the
>>> design matrix may help). The SurfacesB coefficient compares the B
>>> and A surfaces (B-A), and the SurfacesC coefficient compares the C
>>> and A surfaces (C-A). If you want the other comparison, you need to
>>> set up a contrasts matrix like this:
>>>
>>> > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design),
>>> "B-C"))
>>> B-C
>>> (Intercept) 0
>>> PatientsOB2 0
>>> PatientsOB3 0
>>> PatientsOB4 0
>>> PatientsOB5 0
>>> SurfacesB 1
>>> SurfacesC -1
>>>
>>> Because this will compute (B-A) - (C-A) = B-C.
>>>
>>> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes
>>> different between A and B, coef=7 will give you the genes different
>>> between A and C, and fitting the contrast will give you the genes
>>> different between B and C.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>
>>>
>>>> As I am not interested in the differential expression amongst
>>>> patients, how do I avoid these being considered?
>>>>
>>>> How can I know about the differences amongst surfaces B and C?
>>>>
>>>> Do I need to or can I make a contrast matrix to specify which are
>>>> the comparisons I want to get information for? (only surfaces, and
>>>> not amongst patients)
>>>>
>>>> If I can make a contrast matrix, can you give me an example of how
>>>> to do it with 3 treatments?
>>>>
>>>> Many Thanks!
>>>>
>>>> Milena
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
--
Milena Gongora
Bioinformatician
SRC Computational Chemistry and Biology Unit
Institute for Molecular Biosciences
The University of Queensland
Phone: +61 7 3346 2609
Fax: +61 7 3346 2101
email: m.gongora at imb.uq.edu.au
More information about the Bioconductor
mailing list