# [BioC] Regression analysis problem

Sean Davis sdavis2 at mail.nih.gov
Thu Oct 6 13:25:14 CEST 2011

```On Thu, Oct 6, 2011 at 7:15 AM, khadeeja ismail <hajjja at yahoo.com> wrote:
> Hi everyone,
>
> I am wondering if anyone can help me understand the following output from limma.
>
> The problem goes like this:
> Say I have methylation values for four genes from three twin pairs. I created some dummy data so that values of gene1 is clearly different within twin pairs. My interest here is to find out the genes that are significantly different within the pairs (i.e. between T1a and T1b,  T2a and T2b and T3a and T3b)
>
>
>                 T1a T1b T2a T2b T3a T3b
>
> "gene1"    16    4    18    4    19    5
> "gene2"    3      1     4     5     2     6
> "gene3"    2      1     4     3     4     5
> "gene4"    3      1     2     3     3    4
>
>
> I created a design matrix:
>
>    > Pair<-c("TWIN1","TWIN1","TWIN2","TWIN2","TWIN3","TWIN3")
>     >Classes<-c(2,1,2,1,2,1)
>     >BMI<-c(45,20,34,12,25,19)
>     >Pair<-factor(Pair)
>     >Classes<-factor(Classes)
>     >design <- model.matrix(~Pair+Classes)
>
>> design
>   (Intercept) PairTWIN2 PairTWIN3 Classes2
> 1           1         0         0        1
> 2           1         0         0        0
> 3           1         1         0        1
> 4           1         1         0        0
> 5           1         0         1        1
> 6           1         0         1        0
>
>
> **********
>
> ...and fitted a linear model to the data to get the following results:
> ************************************************************
> ID X.Intercept. PairTWIN2 PairTWIN3      Classes2   AveExpr          F
> 1 gene1     3.333333       1.0       2.0  1.333333e+01 11.000000 106.298724
> 2 gene2     2.500000       2.5       2.0 -1.000000e+00  3.500000   8.745648
> 3 gene3     1.333333       2.0       3.0  3.333333e-01  3.166667   7.430245
> 4 gene4     2.000000       0.5       1.5  9.064933e-16  2.666667   4.799441
> 1 9.993042e-91 3.997217e-90
> 2 4.683758e-07 9.367515e-07
> 3 5.578117e-06 7.437489e-06
> 4 7.186523e-04 7.186523e-04
> *************************************************************

You didn't show us the rest of the code.  In particular, I suspect
that you are using something like:

topTable(X)

You might try:

topTable(X,coef=4)

Sean

>
> As you can see, all the genes have adjusted p values less than 0.05.
>
> But if I just compute the differnces between the pairs
>
>              T1 T2 T3
> gene1   12  14  14
> gene2    2  -1  -4
> gene3    1   1  -1
> gene4    2  -1  -1
>
>
> ....and used the design matrix (1,1,1),
> gene 1 comes up as significantly different within the pairs.
>
>  ID      logFC          t      P.Value    adj.P.Val         B
> 1 gene1 13.3333333 10.6660452 5.234557e-06 2.093823e-05 46.016217
> 2 gene2 -1.0000000 -0.7999534 4.468388e-01 8.936777e-01 -5.513313
> 3 gene3  0.3333333  0.2666511 7.964821e-01 1.000000e+00 -5.772418
> 4 gene4  0.0000000  0.0000000 1.000000e+00 1.000000e+00 -5.804806
>
>
> So, my questions are:
> Why am I not getting the same (expected) result from my first approach? Or am I misinterpreting  the output?
>
> What does the p value in the 1st approach signify?
>
> And most importantly, am I applying the right model?
>
> Any help on this would be most appreciated.
>
> Thanks in advance,
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
```