[BioC] nested factors in edgeR [was: something wrong with my design matrix in edgeR]

Gordon K Smyth smyth at wehi.EDU.AU
Fri Apr 27 04:05:58 CEST 2012


Dear Li,

Thanks for showing the targets file -- that makes your experiment very 
clear.

Your design matrix can be seen to be incorrect because it has five columns 
for an experiment that has only four distinct groups.  The problem is that 
the populations are subgroups, nested within the latitudes north and 
south.  So your two factors are nested rather than crossed.

I assume you want to test for differences between the two north 
populations (FBK and NWL), and then for differences between the two south 
populations (SOU and CAR), and then later for N vs S.  There are many ways 
to do this, but the following is perhaps the easiest:

   pop <- factor(targets$pop, levels=c("FBK","NWL","SOU","CAR"))
   design <- model.matrix(~pop)

Then later

   fit <- glmFit(d, design)

so the coefficients 2-3 compare NWL, SOU and CAR respectively back to FBK.

To test NWL vs FBK (within the north group):

   lrt <- glmFit(d, fit, coef=2)
   topTags(lrt)

To test CAR vs SOU (within the south group):

   lrt <- glmFit(d, fit, contrast=c(0,0,-1,1))
   topTags(lrt)

To test S vs N overall:

   lrt <- glmFit(d, fit, contrast=c(0,-1,1,1)/2)
   topTags(lrt)

Best wishes
Gordon


--------------- original message --------------
[BioC] something wrong with my design matrix in edgeR
Wang, Li li.wang at ttu.edu
Thu Apr 26 18:56:46 CEST 2012

Dear List members

My targets include 31 samples representing  two groups (south and north) 
and also four populations (FBK, NWL, SOU, CAR).

> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
> targets
        files group pop
1  FBK01.txt     N FBK
2  FBK04.txt     N FBK
3  FBK05.txt     N FBK
4  FBK08.txt     N FBK
5  FBK09.txt     N FBK
6  FBK10.txt     N FBK
7  FBK13.txt     N FBK
8  NWL02.txt     N NWL
9  NWL04.txt     N NWL
10 NWL06.txt     N NWL
11 NWL08.txt     N NWL
12 NWL09.txt     N NWL
13 NWL10.txt     N NWL
14 NWL11.txt     N NWL
15 NWL14.txt     N NWL
16 SOU01.txt     S SOU
17 SOU02.txt     S SOU
18 SOU03.txt     S SOU
19 SOU04.txt     S SOU
20 SOU07.txt     S SOU
21 SOU09.txt     S SOU
22 SOU11.txt     S SOU
23 SOU15.txt     S SOU
24 CAR01.txt     S CAR
25 CAR02.txt     S CAR
26 CAR03.txt     S CAR
27 CAR04.txt     S CAR
28 CAR05.txt     S CAR
29 CAR11.txt     S CAR
30 CAR12.txt     S CAR
31 CAR14.txt     S CAR

We are interested with the differential expression between north and south 
group by adjusting for differences between populations.

> latitude <- factor(targets$group)
> latitude <- relevel(latitude, ref="N")
> pop <- factor(targets$pop)
> design <- model.matrix(~pop+latitude)
> rownames(design) <- rownames(d$samples)
> design
    (Intercept) popFBK popNWL popSOU latitudeS
1            1      1      0      0         0
2            1      1      0      0         0
3            1      1      0      0         0
4            1      1      0      0         0
5            1      1      0      0         0
6            1      1      0      0         0
7            1      1      0      0         0
8            1      0      1      0         0
9            1      0      1      0         0
10           1      0      1      0         0
11           1      0      1      0         0
12           1      0      1      0         0
13           1      0      1      0         0
14           1      0      1      0         0
15           1      0      1      0         0
16           1      0      0      1         1
17           1      0      0      1         1
18           1      0      0      1         1
19           1      0      0      1         1
20           1      0      0      1         1
21           1      0      0      1         1
22           1      0      0      1         1
23           1      0      0      1         1
24           1      0      0      0         1
25           1      0      0      0         1
26           1      0      0      0         1
27           1      0      0      0         1
28           1      0      0      0         1
29           1      0      0      0         1
30           1      0      0      0         1
31           1      0      0      0         1
attr(,"assign")
[1] 0 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$pop
[1] "contr.treatment"

attr(,"contrasts")$latitude
[1] "contr.treatment"

I am wondering maybe there is something wrong with my design matrix. So, 
when I tried to estimate the dispersions, I got the following errors.

> d <- estimateGLMCommonDisp(d, design)
Error in solve.default(R, t(beta)) :
   system is computationally singular: reciprocal condition number = 
5.16368e-17

The following is session information. Any comments are very appreciated.

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] qvalue_1.30.0 edgeR_2.6.0   limma_3.12.0

loaded via a namespace (and not attached):
[1] tcltk_2.15.0 tools_2.15.0

Thanks in advance!

Best wishes
Li

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list