[Bioc-sig-seq] RNA-Seq - analysis of time series data?
Jakob Hedegaard
Jakob.Hedegaard at agrsci.dk
Tue Nov 30 21:46:18 CET 2010
Hi Mark,
Thanks for your suggestions. I will try the different approached and see what I find.
As it is a time-series it would be convenient to be able to extract the list of "genes" being affected at one or more time points/contrastes.
Could the decideTestsDGE function be used for this purpose.
Or is there a function to conduct e.g. a F-test?
Best regards,
Jakob
-----Oprindelig meddelelse-----
Fra: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
Sendt: 25. november 2010 10:41
Til: Jakob Hedegaard; bioc-sig-sequencing at r-project.org
Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
Hi Jakob.
Just another thought on this. One alternative for your analysis may be to treat it as a 1-way setup [using d from d <- DGEList(counts=rd,group=group)] and use the non-GLM methods. This means that you would be limited to pairwise comparisons, but that may suffice.
Cheers,
Mark
On 2010-11-23, at 9:43 PM, Mark Robinson wrote:
> Hi Jakob.
>
> The error message is due to the fact that your design matrix is not full rank (i.e. sum of columns 4-7 = sum of columns 2-3). This is of course my fault, since I suggested it.
>
> That setup (previous post) would've worked for an experiment with observations for every combination of factors, which you do not have (controls are only measured at T=0, AP2/AP6 are not measured at T=0). Non-full-rank design matrices can be difficult to interpret, so an alternative option is this:
>
> S <- factor(rep( c("Acon","AP6","AP2"),
> c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
> c(6,5,6,6,5,5,6,5,4)))
>
> f <- factor( paste(S,T,sep=".") )
> design <- model.matrix( ~ f )
> [...]
>
> As mentioned below, you may need to adjust the reference (here again, control at T=0) for testing a particular contrast of interest.
>
> Let us know how you go.
>
> Cheers,
> Mark
>
> On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:
>
>> Hi Mark,
>>
>> Thanks.
>>
>> I have implemented your suggestion for the design matrix when reading the data (see below), but encounter the following error when running estimateCRDisp:
>>
>>> d <- estimateCRDisp(d,design)
>> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu), :
>> the leading minor of order 7 is not positive definite
>>
>> Applying the simple design:
>> design.time <- model.matrix(~ T, data=d$sample)
>> does not result in the error....
>>
>> Appreciate your help!
>>
>> Best regards,
>> Jakob
>>
>>
>>
>>
>> library(edgeR)
>> r <- read.delim(file.path(datadir, "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
>> dim(r)
>> #[1] 78728 205
>> rdata <- r[2:205]
>> #samp <- substr(colnames(r)[2:205],6,10)
>> #usamp <- unique(samp)
>> samp <- substr(colnames(r)[2:205],16,28)
>> usamp <- unique(samp)
>> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
>> for (i in 1:length(usamp)){
>> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
>> }
>> colnames(rd) <- usamp
>> rownames(rd) <- r[,1]
>> # export data
>> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
>>
>> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
>> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
>> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
>> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
>> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
>> d <- DGEList(counts=rd,group=group)
>> d <- calcNormFactors(d)
>> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
>> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
>> design <- model.matrix(~ S + T, data=d$sample)
>> # Analysis using Cox-Reid common dispersion
>> d <- estimateCRDisp(d,design)
>>
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> AP_01_control 1 0 0 0 0 0 0
>> AP_02_control 1 0 0 0 0 0 0
>> AP_03_control 1 0 0 0 0 0 0
>> AP_28_control 1 0 0 0 0 0 0
>> AP_29_control 1 0 0 0 0 0 0
>> AP_30_control 1 0 0 0 0 0 0
>> AP_31_AP6.T06 1 0 1 1 0 0 0
>> AP_32_AP6.T06 1 0 1 1 0 0 0
>> AP_33_AP6.T06 1 0 1 1 0 0 0
>> AP_34_AP6.T06 1 0 1 1 0 0 0
>> AP_36_AP6.T06 1 0 1 1 0 0 0
>> AP_37_AP6.T12 1 0 1 0 1 0 0
>> AP_38_AP6.T12 1 0 1 0 1 0 0
>> AP_39_AP6.T12 1 0 1 0 1 0 0
>> AP_40_AP6.T12 1 0 1 0 1 0 0
>> AP_41_AP6.T12 1 0 1 0 1 0 0
>> AP_42_AP6.T12 1 0 1 0 1 0 0
>> AP_43_AP6.T24 1 0 1 0 0 1 0
>> AP_44_AP6.T24 1 0 1 0 0 1 0
>> AP_45_AP6.T24 1 0 1 0 0 1 0
>> AP_46_AP6.T24 1 0 1 0 0 1 0
>> AP_47_AP6.T24 1 0 1 0 0 1 0
>> AP_48_AP6.T24 1 0 1 0 0 1 0
>> AP_49_AP6.T48 1 0 1 0 0 0 1
>> AP_50_AP6.T48 1 0 1 0 0 0 1
>> AP_51_AP6.T48 1 0 1 0 0 0 1
>> AP_53_AP6.T48 1 0 1 0 0 0 1
>> AP_54_AP6.T48 1 0 1 0 0 0 1
>> AP_55_AP2.T06 1 1 0 1 0 0 0
>> AP_56_AP2.T06 1 1 0 1 0 0 0
>> AP_57_AP2.T06 1 1 0 1 0 0 0
>> AP_58_AP2.T06 1 1 0 1 0 0 0
>> AP_59_AP2.T06 1 1 0 1 0 0 0
>> AP_60_AP2.T12 1 1 0 0 1 0 0
>> AP_62_AP2.T12 1 1 0 0 1 0 0
>> AP_63_AP2.T12 1 1 0 0 1 0 0
>> AP_64_AP2.T12 1 1 0 0 1 0 0
>> AP_65_AP2.T12 1 1 0 0 1 0 0
>> AP_66_AP2.T12 1 1 0 0 1 0 0
>> AP_67_AP2.T24 1 1 0 0 0 1 0
>> AP_69_AP2.T24 1 1 0 0 0 1 0
>> AP_70_AP2.T24 1 1 0 0 0 1 0
>> AP_71_AP2.T24 1 1 0 0 0 1 0
>> AP_72_AP2.T24 1 1 0 0 0 1 0
>> AP_73_AP2.T48 1 1 0 0 0 0 1
>> AP_75_AP2.T48 1 1 0 0 0 0 1
>> AP_76_AP2.T48 1 1 0 0 0 0 1
>> AP_78_AP2.T48 1 1 0 0 0 0 1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> -----Oprindelig meddelelse-----
>> Fra: Mark Robinson [mailto:mrobinson at wehi.EDU.AU]
>> Sendt: 18. november 2010 05:51
>> Til: Jakob Hedegaard
>> Cc: bioc-sig-sequencing at r-project.org
>> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
>>
>> Hi Jakob.
>>
>> Basically, you can analyze your time course data with edgeR through glmFit() as you have already done, but by choosing an appropriate design matrix that will allow you to test your contrasts of interest.
>>
>> For example, you might define a couple factors (please double check that this actually matches your $samples element of the DGEList object):
>>
>> S <- factor(rep( c("Acon","AP6","AP2"),
>> c(6,5+6+6+5,5+6+5+4)))
>> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
>> c(6,5,6,6,5,5,6,5,4)))
>>
>> ... and then a design matrix:
>>
>> design <- model.matrix( ~ S + T )
>>
>> So, this design matrix will look like this:
>>
>>
>>> design <- model.matrix( ~ S + T )
>>> design
>> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
>> 1 1 0 0 0 0 0 0
>> 2 1 0 0 0 0 0 0
>> 3 1 0 0 0 0 0 0
>> 4 1 0 0 0 0 0 0
>> 5 1 0 0 0 0 0 0
>> 6 1 0 0 0 0 0 0
>> 7 1 0 1 1 0 0 0
>> 8 1 0 1 1 0 0 0
>> 9 1 0 1 1 0 0 0
>> 10 1 0 1 1 0 0 0
>> 11 1 0 1 1 0 0 0
>> 12 1 0 1 0 1 0 0
>> 13 1 0 1 0 1 0 0
>> 14 1 0 1 0 1 0 0
>> 15 1 0 1 0 1 0 0
>> 16 1 0 1 0 1 0 0
>> 17 1 0 1 0 1 0 0
>> 18 1 0 1 0 0 1 0
>> 19 1 0 1 0 0 1 0
>> 20 1 0 1 0 0 1 0
>> 21 1 0 1 0 0 1 0
>> 22 1 0 1 0 0 1 0
>> 23 1 0 1 0 0 1 0
>> 24 1 0 1 0 0 0 1
>> 25 1 0 1 0 0 0 1
>> 26 1 0 1 0 0 0 1
>> 27 1 0 1 0 0 0 1
>> 28 1 0 1 0 0 0 1
>> 29 1 1 0 1 0 0 0
>> 30 1 1 0 1 0 0 0
>> 31 1 1 0 1 0 0 0
>> 32 1 1 0 1 0 0 0
>> 33 1 1 0 1 0 0 0
>> 34 1 1 0 0 1 0 0
>> 35 1 1 0 0 1 0 0
>> 36 1 1 0 0 1 0 0
>> 37 1 1 0 0 1 0 0
>> 38 1 1 0 0 1 0 0
>> 39 1 1 0 0 1 0 0
>> 40 1 1 0 0 0 1 0
>> 41 1 1 0 0 0 1 0
>> 42 1 1 0 0 0 1 0
>> 43 1 1 0 0 0 1 0
>> 44 1 1 0 0 0 1 0
>> 45 1 1 0 0 0 0 1
>> 46 1 1 0 0 0 0 1
>> 47 1 1 0 0 0 0 1
>> 48 1 1 0 0 0 0 1
>> attr(,"assign")
>> [1] 0 1 1 2 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$S
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$T
>> [1] "contr.treatment"
>>
>>
>> Note that I've intentionally changed "con" to "Acon" in order to make it the control group ... so, the intercept term above represents the abundance parameter for the control/T00 samples. Then, you can select your comparison of interest through the 'coef' argument of the glmLRT() function. Note that you can select multiple columns.
>>
>> By judicious choice of the design matrix, you should be able to fit any contrast of interest. Currently you will probably need to redefine the design matrix for a different reference sample to obtain different contrasts. We will be adding a 'contrasts' argument to glmLRT() shortly, which will make it much easier to investigate different contrasts of interest.
>>
>> Hope that helps.
>>
>> Cheers,
>> Mark
>>
>>
>> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
>>
>>> Hi,
>>>
>>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>>>
>>> The dataset origin from a challenge experiment with 4-6 samples per time point (T00,T06,T12,T24 and T48) from two series of challenge (with one of two different serotypes). In total 48 individual samples and two factors, time and serotype (see the table below for details)
>>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>>>
>>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes significantly affected to each time point relative to time zero (e.g. T12-T00) - thus initially ignoring the potential difference between the serotypes.
>>>
>>> But how can I obtain the genes significantly affected across the different contrastes?
>>>
>>>
>>> group lib.size norm.factors serotype time
>>> AP_01 control 3734226 1.0575860 con T00
>>> AP_02 control 4528260 1.0581673 con T00
>>> AP_03 control 3648163 1.0594602 con T00
>>> AP_28 control 4671178 1.0430147 con T00
>>> AP_29 control 3746020 1.0528085 con T00
>>> AP_30 control 4471915 1.1277386 con T00
>>> AP_31 AP6.T06 7384334 0.9187757 AP6 T06
>>> AP_32 AP6.T06 3649621 0.9035999 AP6 T06
>>> AP_33 AP6.T06 5644324 0.8929802 AP6 T06
>>> AP_34 AP6.T06 3791540 0.9396600 AP6 T06
>>> AP_36 AP6.T06 3922243 0.8524249 AP6 T06
>>> AP_37 AP6.T12 3113854 1.0491183 AP6 T12
>>> AP_38 AP6.T12 2153867 1.0996506 AP6 T12
>>> AP_39 AP6.T12 2979503 1.0905274 AP6 T12
>>> AP_40 AP6.T12 5375493 1.0420513 AP6 T12
>>> AP_41 AP6.T12 3769654 0.9094147 AP6 T12
>>> AP_42 AP6.T12 2621303 1.1272673 AP6 T12
>>> AP_43 AP6.T24 3537847 1.0037276 AP6 T24
>>> AP_44 AP6.T24 3660552 1.0757808 AP6 T24
>>> AP_45 AP6.T24 3284038 1.0701603 AP6 T24
>>> AP_46 AP6.T24 3250374 1.0230341 AP6 T24
>>> AP_47 AP6.T24 7208387 0.9535068 AP6 T24
>>> AP_48 AP6.T24 4169075 0.9730747 AP6 T24
>>> AP_49 AP6.T48 5989902 0.9825794 AP6 T48
>>> AP_50 AP6.T48 3529028 0.9471979 AP6 T48
>>> AP_51 AP6.T48 5104071 1.0772029 AP6 T48
>>> AP_53 AP6.T48 4823387 0.9710606 AP6 T48
>>> AP_54 AP6.T48 3788201 1.0976409 AP6 T48
>>> AP_55 AP2.T06 4919578 0.7872065 AP2 T06
>>> AP_56 AP2.T06 4580068 0.9533078 AP2 T06
>>> AP_57 AP2.T06 3908180 0.9193207 AP2 T06
>>> AP_58 AP2.T06 3466801 0.9887874 AP2 T06
>>> AP_59 AP2.T06 4267998 0.8769085 AP2 T06
>>> AP_60 AP2.T12 4533905 0.9058305 AP2 T12
>>> AP_62 AP2.T12 5906089 0.9352388 AP2 T12
>>> AP_63 AP2.T12 3676318 1.1260072 AP2 T12
>>> AP_64 AP2.T12 2206081 1.0579246 AP2 T12
>>> AP_65 AP2.T12 3955338 1.0221930 AP2 T12
>>> AP_66 AP2.T12 3775918 0.9300664 AP2 T12
>>> AP_67 AP2.T24 3853681 0.9659259 AP2 T24
>>> AP_69 AP2.T24 3761829 0.9592286 AP2 T24
>>> AP_70 AP2.T24 4263273 1.0742397 AP2 T24
>>> AP_71 AP2.T24 4736798 0.9864702 AP2 T24
>>> AP_72 AP2.T24 6387114 1.0462401 AP2 T24
>>> AP_73 AP2.T48 3389351 1.0303610 AP2 T48
>>> AP_75 AP2.T48 1489023 1.1863821 AP2 T48
>>> AP_76 AP2.T48 3588175 1.0250639 AP2 T48
>>> AP_78 AP2.T48 3848562 0.9855582 AP2 T48
>>>
>>>
>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C LC_TIME=Danish_Denmark.1252
>>>
>>> attached base packages:
>>> [1] grDevices datasets splines graphics stats tcltk utils methods base
>>>
>>> other attached packages:
>>> [1] edgeR_1.8.2 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.35-8
>>>
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.13.1 grid_2.12.0 lattice_0.19-13 limma_3.6.6 svMisc_0.9-60 tools_2.12.0
>>>
>>>
>>> Best regards
>>>
>>> Jakob Hedegaard
>>> Project scientist
>>>
>>> AARHUS UNIVERSITY
>>> Faculty of Agricultural Sciences
>>> Dept. of Genetics and Biotechnology
>>> Blichers Allé 20, P.O. BOX 50
>>> DK-8830 Tjele
>>> Denmark
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> ------------------------------
>> Mark Robinson, PhD (Melb)
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: mrobinson at wehi.edu.au
>> e: m.robinson at garvan.org.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>> ------------------------------
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:24}}
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioc-sig-sequencing
mailing list