[BioC] EdgeR multi-factor design questions
James W. MacDonald
jmacdon at uw.edu
Wed Mar 27 15:39:56 CET 2013
Hi Morgan,
On 3/26/2013 4:40 PM, Morgan Mouchka wrote:
> Hi Jim,
>
> On Mar 25, 2013, at 4:01 PM, James W. MacDonald wrote:
>
>> Hi Morgan,
>>
>> On 3/25/2013 3:15 PM, Morgan Mouchka wrote:
>>> Hello all,
>>>
>>> I’m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual.
>>>
>>>
>>> More specifically, I have two questions. The first is whether I have chosen the correct method for analyzing my multi-factorial experiment. I have 2 factors, state (apo, sym) and treatment (control, treatment). I’m interested if there is a differential response due to the treatment that varies by state. In other words, do apos respond to the treatment in the same way that syms respond?
>>>
>>>
>>> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups. I have pasted my session output and preliminary code below. Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question.
>>>
>>> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other? For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state? It seems that this could be some sort of interaction term, but I’m uncertain as to the best way to set this up.
>> This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want
>>
>> glmLRT(fit, contrast = c(1,-1,-1,1))
> I'm a bit confused, but wouldn't this model be combining the controls (apo and sym) and comparing it to the treatments (apo and sym) in an additive fashion? When I run this model, I get different results than when I run your second model below. Are they suppose to provide the same output or did I misunderstand your post? (Certainly likely, I'm very new to GLM and edgeR).
This contrast is the same as (Sym trt - Sym ctrl) - (Apo trt - Apo
ctrl), which tests to see if the difference between treatment and
control varies according to state. This is the same as the coefficient
treat2:state2 in the alternative parameterization below.
But two things here. First, I had a brain cramp in my earlier email to
you. The correct interpretations of the coefficients are
Apo ctrl
Apo trt - Apo ctrl
Sym ctrl - Apo ctrl
(Sym trt - Sym ctrl) - (Apo trt - Apo ctrl)
So coef = 3 doesn't give you Sym trt - Sym ctrl. It is more difficult to
get that contrast from this parameterization, so sticking with what you
already had is the easiest way to go.
Second, I don't see in your code below where you re-fit the model using
the alternative parameterization, so I tried this with some data I have
in hand. Note that you have to go all the way back to the beginning,
recreating the dgeList object with the different design matrix. So,
using your parameterization I get
> topTags(glmLRT(fit, contrast = c(1,-1,-1,1)))
symbol egids logFC logCPM LR PValue FDR
319317 Snhg11 319317 -8.012675 8.499244 333.5330 1.634236e-74 2.088554e-70
21938 Tnfrsf1b 21938 6.895792 4.632327 268.2860 2.682053e-60 1.713832e-56
16365 Irg1 16365 7.709183 4.697518 248.7724 4.809099e-56 2.048676e-52
18133 Nov 18133 6.798803 6.007596 237.2266 1.582884e-53 5.057313e-50
13058 Cybb 13058 7.395509 5.371628 233.3577 1.104324e-52 2.822652e-49
20302 Ccl3 20302 7.776920 4.644823 220.7970 6.060828e-50 1.290956e-46
22671 Rnf112 22671 -6.945110 6.711583 217.2831 3.540102e-49 6.463215e-46
13733 Emr1 13733 6.855894 5.131320 210.6611 9.854565e-48 1.574267e-44
20266 Scn1b 20266 -5.007392 7.965721 206.7374 7.074398e-47 1.004565e-43
110834 Chrna3 110834 6.344460 4.341593 206.1794 9.363937e-47 1.196711e-43
and then re-parameterizing
> treat <- factor(rep(c(1,2,1,2), c(5,3,1,3)))
> type <- factor(rep(c(1,2,1,2), c(3,2,4,3)))
> design2 <- model.matrix(~treat*type)
> dgelst2 <- estimateGLMCommonDisp(DGEList(counts=cnts, genes=tx.filt),
design2)
Calculating library sizes from column totals.
> dgelst2 <- estimateGLMTrendedDisp(dgelst2, design2)
> dgelst2 <- estimateGLMTagwiseDisp(dgelst2, design2)
> fit2 <- glmFit(dgelst2, design2)
> topTags(glmLRT(fit2, coef=4))
Coefficient: treat2:type2
symbol egids logFC logCPM LR PValue FDR
319317 Snhg11 319317 -7.869265 8.428033 388.4536 1.796893e-86 2.296429e-82
13058 Cybb 13058 7.824446 5.392286 319.0685 2.310988e-71 1.476721e-67
16365 Irg1 16365 7.773614 4.714316 249.1532 3.972389e-56 1.692238e-52
13733 Emr1 13733 7.444357 5.144043 246.3510 1.621694e-55 4.197947e-52
21938 Tnfrsf1b 21938 6.915307 4.651340 246.3258 1.642389e-55 4.197947e-52
20266 Scn1b 20266 -4.879057 7.909679 238.6288 7.828831e-54 1.667541e-50
18133 Nov 18133 6.871500 6.026197 237.6275 1.294259e-53 2.362947e-50
20302 Ccl3 20302 7.784824 4.663569 209.2621 1.990055e-47 3.179113e-44
100039795 Ildr2 100039795 4.813985 6.396443 199.6276 2.518196e-45
3.575838e-42
110834 Chrna3 110834 6.382514 4.361077 196.0642 1.509239e-44 1.928808e-41
So you can see that the likelihood ratios are the same - the only
difference here is likely due to the fact that we are estimating the
dispersions on different coefficients (e.g., the other coefficients in
the model), so the denominator of the statistic will change, resulting
in slightly different p-values and ordering. But all in all, very
similar results.
Best,
Jim
>> or you could do
>>
>> state<- factor(rep(1:2, each = 8))
>> treat<- factor(rep(1:2, each=4, times=2))
>>
>> design<- model.matrix(~treat*state)
>>
>> and then
>>
>> glmLRT(fit, coef=4)
>>
>> will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below.
> When I use coefficients 2 and 3 in the above model, I get different results from when I compare A and B (apo control to apo treatment) and C and D (sym control to sym treatment) in my original model. Should they be the same? I have posted my most recent session below using topTags to demonstrate the differing results. I ran my model first and then ran your recommended models, followed by your model with coefficients 2 and 3. Hopefully this all makes sense?
> Thanks so much,
> Morgan
>
>> library(edgeR)
>> #reading the counts from a file
>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>> head (x)
> AC3 AC4 AC5 AC7 AT1 AT2
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470
> AT4 AT5 SC1 SC2 SC3 SC5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194
> ST1 ST2 ST3 ST5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221
>
> #MY MODEL
>
>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>> y<- DGEList (counts = x, group = group)
> Calculating library sizes from column totals.
>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>> keep<- rowSums (cpm(y)>1)>= 4
>> y2<- y[keep,]
>> dim(y2)
> [1] 17739 16
>> y2$samples$lib.size<- colSums(y2$counts)
>> # Normalize for sample-specific effects
>> y3<- calcNormFactors (y2)
>> y3$samples
> group lib.size norm.factors
> AC3 A 11095093 1.0237872
> AC4 A 5604841 0.9912734
> AC5 A 7517691 0.9824405
> AC7 A 5918039 0.9919439
> AT1 B 6004830 0.9609672
> AT2 B 7692419 0.9543765
> AT4 B 5197815 0.9575609
> AT5 B 7759281 0.9643070
> SC1 C 3542875 1.0693587
> SC2 C 6720709 1.0488297
> SC3 C 3881774 1.0022528
> SC5 C 5109584 1.0429925
> ST1 D 10429666 0.9815467
> ST2 D 9662842 1.0103866
> ST3 D 8267838 1.0198095
> ST5 D 10698815 1.0069064
>> design<- model.matrix(~0+group, data=y3$samples)
>> colnames(design)<- c("A", "B", "C", "D")
>> design
> A B C D
> AC3 1 0 0 0
> AC4 1 0 0 0
> AC5 1 0 0 0
> AC7 1 0 0 0
> AT1 0 1 0 0
> AT2 0 1 0 0
> AT4 0 1 0 0
> AT5 0 1 0 0
> SC1 0 0 1 0
> SC2 0 0 1 0
> SC3 0 0 1 0
> SC5 0 0 1 0
> ST1 0 0 0 1
> ST2 0 0 0 1
> ST3 0 0 0 1
> ST5 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
> Disp = 0.02061 , BCV = 0.1436
>> #estimate gene-wise dispersion
>> y5<- estimateGLMTrendedDisp(y4, design)
> Loading required package: splines
>> y6<- estimateGLMTagwiseDisp(y5,design)
>> fit<- glmFit(y6, design)
>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>> #pairwise comparison of B and A
>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>> #table of top DE tags
>> topTags(lrt.BvsA)
> Coefficient: -1*A 1*B
> logFC logCPM LR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.506598 8.226827 461.5764
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 4.620617 7.708649 442.6438
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 4.281604 4.374277 339.8684
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 2.661941 7.444760 280.1675
> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 2.227577 6.807869 256.4828
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 2.451201 9.549261 252.1390
> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 2.119708 7.827086 246.6841
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.080116 6.409451 246.1767
> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 2.124257 8.493526 244.0451
> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 3.282939 6.540550 239.7896
> PValue FDR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.181936e-102 3.870537e-98
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 2.877748e-98 2.552418e-94
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 6.816117e-76 4.030370e-72
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 6.903808e-63 3.061666e-59
> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840 1.002773e-57 3.557638e-54
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 8.874275e-57 2.623679e-53
> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099 1.371971e-55 3.476771e-52
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 1.770015e-55 3.924786e-52
> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963 5.160772e-55 1.017188e-51
> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818 4.371068e-54 7.753838e-51
>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>> topTags(lrt.DvsC)
> Coefficient: -1*C 1*D
> logFC logCPM LR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.477714 8.226827 157.20112
> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 -1.986510 5.205879 131.37771
> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 -1.451476 6.950002 103.49253
> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 -1.574898 6.554798 88.60841
> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 -2.204284 3.908595 87.97649
> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.060784 3.603245 80.94718
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 2.280971 4.374277 80.10507
> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 -2.016184 5.268778 80.01864
> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 -2.035882 3.040181 76.14863
> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 -1.538410 5.854172 72.23494
> PValue FDR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 4.625963e-36 8.205996e-32
> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425 2.047045e-30 1.815627e-26
> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676 2.613781e-24 1.545529e-20
> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096 4.812378e-21 2.134169e-17
> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 6.623716e-21 2.349962e-17
> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455 2.318323e-19 6.854121e-16
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705 3.550203e-19 8.224107e-16
> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588 3.708938e-19 8.224107e-16
> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889 2.630983e-18 5.185667e-15
> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 1.910427e-17 3.388906e-14
>
> #JIM'S 1ST MODEL
>> #interactions effect to test if Sym Treat diff from Apo Treat
>> lrt.interaction<- glmLRT(fit, contrast = c(1,-1,-1,1))
>> topTags(lrt.interaction)
> Coefficient: 1*A -1*B -1*C 1*D
> logFC logCPM LR
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 -4.134843 7.708649 220.54182
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 -1.899300 7.444760 76.74320
> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 -1.867935 6.863793 71.58929
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 -1.547852 6.409451 71.51138
> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 -1.489774 6.525675 64.83901
> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 -1.546709 7.548864 61.65883
> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 -1.360292 8.389845 59.29519
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 -1.606681 9.549261 59.26803
> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 -1.274685 7.759520 58.79090
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 -2.028884 8.226827 57.76618
> PValue FDR
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 6.889670e-50 1.222159e-45
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 1.946973e-18 1.726868e-14
> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 2.649905e-17 1.222497e-13
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724 2.756632e-17 1.222497e-13
> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460 8.127420e-16 2.883446e-12
> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213 4.084321e-15 1.207530e-11
> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 1.357077e-14 3.050973e-11
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793 1.375939e-14 3.050973e-11
> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295 1.753507e-14 3.456161e-11
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866 2.952016e-14 5.236581e-11
>
> #ALTERNATIVE
>
>> state<- factor(rep(1:2, each = 8))
>> treat<- factor(rep(1:2, each=4, times=2))
>> design<- model.matrix(~treat*state)
>> design
> (Intercept) treat2 state2 treat2:state2
> 1 1 0 0 0
> 2 1 0 0 0
> 3 1 0 0 0
> 4 1 0 0 0
> 5 1 1 0 0
> 6 1 1 0 0
> 7 1 1 0 0
> 8 1 1 0 0
> 9 1 0 1 0
> 10 1 0 1 0
> 11 1 0 1 0
> 12 1 0 1 0
> 13 1 1 1 1
> 14 1 1 1 1
> 15 1 1 1 1
> 16 1 1 1 1
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$treat
> [1] "contr.treatment"
>
> attr(,"contrasts")$state
> [1] "contr.treatment"
>
>> lrt.interaction2<- glmLRT(fit, coef=4)
>> topTags(lrt.interaction2)
> Coefficient: D
> logFC logCPM LR
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -23.55939 -0.05067457 1620.221
> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 -22.84920 -0.18609176 1955.056
> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 -22.37722 -0.88447008 1923.027
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -22.19649 -0.63260884 3087.708
> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 -22.02352 -0.80276576 2491.409
> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 -21.87997 -0.75239480 3251.146
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.87383 -0.23507267 1632.186
> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 -21.62135 -0.13684358 1905.960
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -21.40488 -0.62850673 2271.751
> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 -21.40385 -0.62656773 2492.533
> PValue FDR
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0
> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731 0 0
> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 0 0
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0
> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648 0 0
> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731 0 0
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0
> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357 0 0
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0
> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992 0 0
>
> #JIM'S 1ST MODEL: SAME AS C AND D COMPARISON (SYM CONTROL TO SYM TREATMENT)?
>> lrt.3<- glmLRT(fit, coef=3)
>> topTags(lrt.3)
> Coefficient: C
> logFC logCPM LR
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 -23.20368 -0.63260884 2954.522
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 -22.24145 -0.05067457 1529.706
> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 -22.23039 -0.57979449 2739.489
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 -22.22741 -0.62850673 2193.659
> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 -21.91166 -0.09022093 2767.658
> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 -21.91140 -0.57058617 1714.034
> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 -21.90846 -0.07570223 3706.103
> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 -21.90643 -0.74190749 2140.909
> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 -21.65265 -0.07156613 1494.994
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.45320 -0.23507267 1558.686
> PValue FDR
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338 0 0
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492 0 0
> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 0 0
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964 0 0
> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290 0 0
> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 0 0
> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899 0 0
> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 0 0
> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380 0 0
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 0 0
>
> #JIM'S 1ST MODEL: SAME AS A AND B COMPARISON (APO CONTROL AND APO TREATMENT)?
>> lrt.2<- glmLRT(fit, coef=2)
>> topTags(lrt.2)
> Coefficient: B
> logFC logCPM LR
> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 -27.77711 6.430503 3886.142
> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 -27.77711 3.945365 2525.716
> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 -27.77711 3.041338 2328.365
> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 -27.77711 6.168745 2973.233
> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 -27.77711 4.657887 2296.824
> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 -27.77711 3.300734 2269.282
> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 -27.77711 2.771320 2360.788
> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 -27.77711 4.930170 3599.471
> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 -27.77711 2.112580 1802.150
> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 -27.77711 6.014430 2939.592
> PValue FDR
> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 0 0
> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422 0 0
> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333 0 0
> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222 0 0
> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091 0 0
> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 0 0
> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211 0 0
> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406 0 0
> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 0 0
> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925 0 0
>> Best,
>>
>> Jim
>>
>>
>>> I’m using the 3.0.8 version of EdgeR and the 2.15.2 version of R. I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment.
>>>
>>> Thanks so much,
>>>
>>> Morgan
>>>
>>> Morgan Mouchka
>>> PhD Candidate
>>> E231 Corson Hall
>>> Dept. Ecology and Evolutionary Biology
>>> Cornell University
>>> mep74 at cornell.edu
>>> http://www.eeb.cornell.edu/mouchka
>>>
>>>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>>>> head (x)
>>> AC3 AC4 AC5 AC7 AT1 AT2
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 72 39 55 31 36 55
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 3 2 3 4 0 2
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260 576 957 529 663 961
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 17 7 6 6 7 8
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 7774 3857 5588 3835 3633 4470
>>> AT4 AT5 SC1 SC2 SC3 SC5
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 22 63 20 55 32 49
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 0 5 4 4 1 2
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 516 903 407 1072 345 788
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 4 5 3 0 2 5
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0 0 0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3668 5489 1092 2288 1381 2194
>>> ST1 ST2 ST3 ST5
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792 39 65 40 82
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409 7 0 1 3
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 958 1103 900 1488
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436 7 11 6 8
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090 0 0 0 0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499 3629 4837 3383 4221
>>>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>>>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>>>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>>>> y<- DGEList (counts = x, group = group)
>>> Calculating library sizes from column totals.
>>>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>>>> keep<- rowSums (cpm(y)>1)>= 4
>>>> y2<- y[keep,]
>>>> dim(y2)
>>> [1] 17739 16
>>>> y2$samples$lib.size<- colSums(y2$counts)
>>>> # Normalize for sample-specific effects
>>>> y3<- calcNormFactors (y2)
>>>> y3$samples
>>> group lib.size norm.factors
>>> AC3 A 11095093 1.0237872
>>> AC4 A 5604841 0.9912734
>>> AC5 A 7517691 0.9824405
>>> AC7 A 5918039 0.9919439
>>> AT1 B 6004830 0.9609672
>>> AT2 B 7692419 0.9543765
>>> AT4 B 5197815 0.9575609
>>> AT5 B 7759281 0.9643070
>>> SC1 C 3542875 1.0693587
>>> SC2 C 6720709 1.0488297
>>> SC3 C 3881774 1.0022528
>>> SC5 C 5109584 1.0429925
>>> ST1 D 10429666 0.9815467
>>> ST2 D 9662842 1.0103866
>>> ST3 D 8267838 1.0198095
>>> ST5 D 10698815 1.0069064
>>>> #plot biological coefficient of variation (BVC)between samples
>>>> plotMDS(y3)
>>>> #check levels
>>>> levels (y3$samples$group)
>>> [1] "A" "B" "C" "D"
>>>> #design matrix to describe treatment conditions
>>>> design<- model.matrix(~0+group, data=y3$samples)
>>>> colnames(design)<- c("A", "B", "C", "D")
>>>> design
>>> A B C D
>>> AC3 1 0 0 0
>>> AC4 1 0 0 0
>>> AC5 1 0 0 0
>>> AC7 1 0 0 0
>>> AT1 0 1 0 0
>>> AT2 0 1 0 0
>>> AT4 0 1 0 0
>>> AT5 0 1 0 0
>>> SC1 0 0 1 0
>>> SC2 0 0 1 0
>>> SC3 0 0 1 0
>>> SC5 0 0 1 0
>>> ST1 0 0 0 1
>>> ST2 0 0 0 1
>>> ST3 0 0 0 1
>>> ST5 0 0 0 1
>>> attr(,"assign")
>>> [1] 1 1 1 1
>>> attr(,"contrasts")
>>> attr(,"contrasts")$group
>>> [1] "contr.treatment"
>>>
>>>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
>>> Disp = 0.02061 , BCV = 0.1436
>>>> #estimate gene-wise dispersion
>>>> y5<- estimateGLMTrendedDisp(y4, design)
>>>> y6<- estimateGLMTagwiseDisp(y5,design)
>>>> plotBCV(y6)
>>>> fit<- glmFit(y6, design)
>>>> # make contrasts
>>>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>>>> #pairwise comparison of B and A (apo treatmetn and apo control)
>>>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>>>> # summary of model fitting funtions
>>>> summary(de<-decideTestsDGE(lrt.BvsA))
>>> [,1]
>>> -1 562
>>> 0 16091
>>> 1 1086
>>>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>>>> summary(de<-decideTestsDGE(lrt.DvsC))
>>> [,1]
>>> -1 496
>>> 0 16710
>>> 1 533
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
> Morgan Mouchka
> PhD Candidate
> E231 Corson Hall
> Dept. Ecology and Evolutionary Biology
> Cornell University
> mep74 at cornell.edu
> http://www.eeb.cornell.edu/mouchka
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list