[BioC] EdgeR - generating appropriate design and contrast matrix : multi-factorial experiment
Zaki Fadlullah
zaki.fadlullah at carif.com.my
Fri Apr 5 00:38:17 CEST 2013
Dear EdgeR developers and kind list members,
I have a RNA-seq experiment which I would like to analyse using edgeR as i think it is a multi-factorial experiment .
After reading the excellent EdgeR user manual as well the wealth of design-matrix related question in the mailing list, I am still unsure about what design matrix would be appropriate for my data. Therefore I would appreciate feedback from members of mailing list.
The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are from different individual, all samples was sequenced once (ie – no replicates)
The aim – To find DE genes based on the sensitivity of tumour samples to drug A [and replicate the same analysis to drug B]
The aim (reworded for clarification) - To find DE genes between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A
Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ;
targets
files samples drug_A drug_B
1 T01 T01 resistant sensitive
2 T02 T02 resistant resistant
3 T03 T03 sensitive resistant
4 T04 T04 medium sensitive
5 T05 T05 medium sensitive
6 T06 T06 resistant sensitive
7 T07 T07 medium resistant
8 T08 T08 medium resistant
9 T09 T09 resistant resistant
10 T10 T10 medium sensitive
11 T11 T11 resistant resistant
12 T12 T12 sensitive resistant
13 T13 T13 resistant resistant
14 T14 T14 sensitive sensitive
15 T15 T15 sensitive resistant
16 T16 T16 sensitive sensitive
17 N01 normal unknown unknown
18 N02 normal unknown unknown
19 N03 normal unknown unknown
To clarify :-
1)All RNA-seq was data was from untreated samples
2)Information on drug sensitivity was obtain from wet-lab experiments
3)No drug sensitivity experiments was done on normal samples, hence the unknown
>From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I am unsure if nested is the better option??
Therefore my design is
Groups = factor(paste(targets$samples,targets$drug_A,sep="."))
design = model.matrix(~0 + Groups)
colnames(design) = levels(Groups)
>From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A).
Therefore my question to dear mailing list members would be,
1) Is my experimental design correct to test my aim? (My gut feeling is it is not...)
2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible?
Would this meta-data be the key?
targets_2
targets
files samples type drug_A drug_B
1 T01 T01 tumour resistant sensitive
2 T02 T02 tumour resistant resistant
3 T03 T03 tumour sensitive resistant
4 T04 T04 tumour medium sensitive
.
.
16 T16 T16 tumour sensitive sensitive
17 N01 normal normal unknown unknown
18 N02 normal normal unknown unknown
19 N03 normal normal unknown unknown
Following the above meta-data, proceed along this line:-
Groups = factor(paste(targets_2$type,targets_2$drug_A,sep="."))
design = model.matrix(~0 + Groups)
colnames(design) = levels(Groups)
my.contrast = makeContrasts(
tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant,
tumour_normal.sensitiveVsresitabt = (tumour.sensitive-normal.unknown)-(tumour.resistant-normal.unknown)
,levels=design)
Would the method above be more appropriate?? But will it account for the variability in the tumour samples? (ie- Does the design above treat the tumour as replicates??)
Thank you for taking time reading this post and I apologies if I included many unnecessary information.
Zaki
More information about the Bioconductor
mailing list