[BioC] edgeR design matrix and contrast questions
Findley Finseth
findleyransler at gmail.com
Tue Mar 5 21:21:18 CET 2013
Hello all --
I am using edgeR to analyze RNASeq data. Thank you so much for this software and the clear and straightforward user guide. I have a question regarding contrasts using the glm approach for a paired experimental design.
I have three tissues (PG, Liv, and Test) from six subjects (1-6). One of my goals is to get a set of genes that are "enriched" for each tissue. To this end, I would like to compare each tissue to the average of the other two (e.g., PG - (liver + testes)/2), while accounting for the fact that my experiment is paired by subject. I understand how to build an appropriate design matrix and call contrasts to either compare each tissue to the average of the other two (e.g., edgeR section 3.4.3-4) or perform pairwise comparisons between tissues while including subject (e.g., edgeR section 3.4.1) . However, I start struggling when I try to combine these goals and compare one tissue to the average of the other two while pairing by subject. I have also considered the multi-level approach (e.g., limma section 8.7) but am not doing both independent and paired comparisons, so did not think that was an appropriate avenue.
I am pasting my session output with preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question.
On a slightly different note, I am getting the following error when using topTags (also on session output below):
Error in abs(object$table$logFC) :
Non-numeric argument to mathematical function
I noticed there was a previous post in December about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it.
Thank you in advance,
Findley Finseth
PhD Candidate
Dept Ecology and Evolutionary Biology
Cornell University
Ithaca, NY 14850
> library(edgeR)
Loading required package: limma
>
> # making a matrix of factors called "targets", Tissue = tissue, Ind2 = subject
> targets <-readTargets()
> targets
X Tissue Ind2
1 JQ_112_Liv_TG Liv 1
2 JQ_122-2_PG_A PG 2
3 JQ_179_Liv_AC Liv 3
4 JQ_191_Test_C Test 4
5 JQ_255_PG_AGT PG 5
6 JQ_107_Liv_CC Liv 6
7 JQ_107_PG_AGT PG 6
8 JQ_122-2_Test Test 2
9 JQ_191_Liv_AT Liv 4
10 JQ_107_Test_G Test 6
11 JQ_112_PG_CGA PG 1
12 JQ_112_Test_A Test 1
13 JQ_122-2_Liv_ Liv 2
14 JQ_191_PG_AGT PG 4
15 JQ_179_PG_AGT PG 3
16 JQ_179_Test_A Test 3
17 JQ_255_Liv_GA Liv 5
18 JQ_255_Test_G Test 5
>
> # importing raw data
> rawdata <- read.delim("controlonly_practiceEdgeR.txt")
> head(rawdata)
Gene sample2 sample6 sample7 sample9 sample10 sample12 sample13 sample17
1 comp326924_c0_seq1 23 7 3 6 8 16 5 6
2 comp434050_c0_seq1 2 0 0 26 0 0 0 34
3 comp28996_c0_seq1 344 161 191 284 144 354 114 338
4 comp1083897_c0_seq1 1 4 1 0 1 2 0 0
5 comp544783_c0_seq1 3 0 0 4 0 0 0 11
6 comp654539_c0_seq1 0 0 0 9 0 1 0 11
sample20 sample22 sample24 sample25 sample29 sample30 sample36 sample37 sample39
1 6 20 4 8 10 2 1 4 32
2 0 49 0 10 0 0 0 19 1
3 154 815 196 163 252 51 74 168 352
4 0 1 4 0 0 2 0 0 4
5 1 19 0 2 2 0 0 3 0
6 1 4 0 5 0 0 0 3 0
sample40
1 2
2 18
3 182
4 0
5 2
6 3
>
> # setting groups equal to tissue differences
> group <- targets$Tissue
> group <- as.factor(group)
> group
[1] Liv PG Liv Test PG Liv PG Test Liv Test PG Test Liv PG PG Test Liv
[18] Test
Levels: Liv PG Test
>
>
> # making my DGE list
> y <- DGEList(counts=rawdata[,2:19],genes=rawdata[,1],group=group)
Calculating library sizes from column totals.
>
> # filtering out lowly expressed genes; since the smallest group size is six, we keeping genes with at least one count per million in at least six samples
> keep <- rowSums(cpm(y)>1) >= 6
> y <- y[keep,]
> dim(y) # retained 30,140 genes
[1] 30140 18
>
>
> #recomputing the library sizes:
> y$samples$lib.size <- colSums(y$counts)
>
> #calculating normalization factors
> y <- calcNormFactors(y)
> y$samples # looks good
group lib.size norm.factors
sample2 Liv 13003017 1.2319091
sample6 PG 12388919 0.7524641
sample7 Liv 8045685 0.6490458
sample9 Test 9164420 1.5566202
sample10 PG 15292051 0.5119163
sample12 Liv 13042773 0.8596694
sample13 PG 11862830 0.6419963
sample17 Test 9472129 1.6400614
sample20 Liv 6514347 0.8206096
sample22 Test 25926605 1.4096284
sample24 PG 12334446 0.7888070
sample25 Test 6283137 1.6230035
sample29 Liv 19970487 0.8060990
sample30 PG 6159314 0.7825644
sample36 PG 3897369 0.9081739
sample37 Test 5778563 1.6550333
sample39 Liv 15619497 0.9618765
sample40 Test 5076345 1.7061590
>
>
>
> # setting subject equal to individuals to reflect fact that samples are paired
> subject <- factor(targets$Ind2)
>
> ############### building design matrix; could use advice here regarding appropriateness##########
> design4 <- model.matrix(~subject + group)
> design4
(Intercept) subject2 subject3 subject4 subject5 subject6 groupPG groupTest
1 1 0 0 0 0 0 0 0
2 1 1 0 0 0 0 1 0
3 1 0 1 0 0 0 0 0
4 1 0 0 1 0 0 0 1
5 1 0 0 0 1 0 1 0
6 1 0 0 0 0 1 0 0
7 1 0 0 0 0 1 1 0
8 1 1 0 0 0 0 0 1
9 1 0 0 1 0 0 0 0
10 1 0 0 0 0 1 0 1
11 1 0 0 0 0 0 1 0
12 1 0 0 0 0 0 0 1
13 1 1 0 0 0 0 0 0
14 1 0 0 1 0 0 1 0
15 1 0 1 0 0 0 1 0
16 1 0 1 0 0 0 0 1
17 1 0 0 0 1 0 0 0
18 1 0 0 0 1 0 0 1
attr(,"assign")
[1] 0 1 1 1 1 1 2 2
attr(,"contrasts")
attr(,"contrasts")$subject
[1] "contr.treatment"
attr(,"contrasts")$group
[1] "contr.treatment"
>
> # estimating dispersions
> y4 <- estimateGLMCommonDisp(y,design4)
> y4 <- estimateGLMTrendedDisp(y4,design4)
Loading required package: splines
> y4 <- estimateGLMTagwiseDisp(y4,design4)
>
> # building contrasts
>
> ##############This is where things start to get a bit sticky for me; I am making some preliminary contrasts below, but am unsure of whether I've treated the intercept appropriately. I think it refers to groupLiv, though I have some uncertainty about this, given that subject is the first term in the model##########
>
> #########ultimately, I would like to compare (PG - (Liv+Test)/2) and so on for each tissue type###########
>
> my.contrast <- makeContrasts(
+ Liv=(Intercept)-(groupPG+groupTest)/2,
+ PG=groupPG-((Intercept)+groupTest)/2,
+ Test=groupTest-(groupPG+(Intercept))/2,
+ levels=design4)
Warning message:
In makeContrasts(Liv = (Intercept) - (groupPG + groupTest)/2, PG = groupPG - :
Renaming (Intercept) to Intercept
>
>
> # fitting a glm
> fit <- glmFit(y4,design4)
>
>
> # likelihood ratio tests
>
> lrt.Liv <- glmLRT(fit, my.contrast[,"Liv"])
> lrt.PG <- glmLRT(fit, my.contrast[,"PG"])
> lrt.Testes <- glmLRT(fit, my.contrast[,"Test"])
>
> #finding top tags;
> topTags(lrt.Liv)
Error in abs(object$table$logFC) :
Non-numeric argument to mathematical function
> topTags(lrt.PG)
Error in abs(object$table$logFC) :
Non-numeric argument to mathematical function
> topTags(lrt.Testes)
Error in abs(object$table$logFC) :
Non-numeric argument to mathematical function
>
>
> #### Note: I am also getting an error after using topTags about passing a non-numeric argument to a mathermatical function. I noticed there was a previous post about similarly passing a single contrast to a glmLRT and retrieving the same error, which was corrected in edgeR_3.0.7. As I am using 3.0.8, I am assuming the error has more to do with how I have set up/called my contrasts, but thought I would mention it.
>
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.0.8 limma_3.14.4
loaded via a namespace (and not attached):
[1] tools_2.15.2
More information about the Bioconductor
mailing list