[BioC] edgeR for data combined from different studies and/or platforms
guest at bioconductor.org
Mon Mar 18 03:48:50 CET 2013
How suitable is edgeR for analyzing RNA sequencing data obtained from multiple studies, possibly using multiple platforms?
I am trying to compare mRNA sequencing data obtained for two different cancers by the Cancer Genome Atlas (TCGA) project. Different research teams are handling the work for the two different cancers, and TCGA regularly releases updated, 'level 3,' (within-cancer) RSEM-processed data for cancer-specific sub-projects (each with 200+ samples).
I am trying to use edgeR for differential expression analyses with Exact test, using 'raw count' values in the two cancer data-sets as the input for edgeR. I plan to use edgeR with its default settings, except for prior.df in estimateTagwiseDisp() -- intend to use 0.5 instead of 20 -- and, rowsum.filter in estimateCommonDisp() -- intend to use perhaps 500 instead of 5.
(1) Is it OK to use edgeR for such cross-study comparison when the two groups I want to compare have been exclusively examined by just one of the two studies?
(2) In my case, the sequencing platform is the same for the two studies. Had it been different, could I still use edgeR?
(3) Do answers to the above two questions also apply for microRNA sequencing studies (where library [total count] sizes are typically 10-20x smaller)?
-- output of sessionInfo():
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
attached base packages:
 grid stats graphics grDevices utils datasets methods
other attached packages:
 edgeR_3.0.8 limma_3.14.4 EBSeq_1.1.6
 gplots_2.11.0 MASS_7.3-23 KernSmooth_2.23-9
 caTools_1.14 gdata_2.12.0 gtools_2.7.0
 blockmodeling_0.1.8 reshape2_1.2.2 plyr_1.8
loaded via a namespace (and not attached):
 bitops_1.0-4.2 stringr_0.6.2 tools_2.15.1
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor