[BioC] EdgeR - generating appropriate design and contrast matrix : multi-factorial experiment

Sat Apr 6 00:34:12 CEST 2013

Dear Zaki,

The best way forward would be for you to collaborate with a statistician 
at your own institution, if you can possibly do that.  edgeR provides the 
capabilities to do lots of analyses, but figuring out what analyses are 
appropriate for your scientific problem is another question.  When I work 
with biologists, it often takes months or years for us to understand all 
the scientist's questions and to translate these into appropriate 
statistical analyses.  So there is no way that I can tell you in a few 
sentences how to analyse your data appropriately.

However your stated aim "To find DE genes between tumour samples which are 
sensitive to drug A and tumour samples which are resistant to drug A" 
seems to have an easy answer.  The drug_A column splits your samples into 
four groups (resistant, medium, sensitive, unknown) and you want to 
compare the resistant and sensitive groups.  This is one-way layout, and 
you can follow Section 3.2 of the edgeR User's Guide.

Please don't put sample IDs into a factor like:

  Groups = factor(paste(targets$samples,targets$drug_A,sep="."))

This is in effect trying to treat each sample as its own group, and that 
makes no sense.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Fri, 5 Apr 2013, Zaki Fadlullah wrote:

> Dear EdgeR developers and kind list members,

> I have a RNA-seq experiment which I would like to analyse using edgeR as 
> i think it is a multi-factorial experiment .
>
> After reading the excellent EdgeR user manual as well the wealth of 
> design-matrix related question in the mailing list, I am still unsure 
> about what design matrix would be appropriate for my data. Therefore I 
> would appreciate feedback from members of mailing list.
>
> The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are 
> from different individual, all samples was sequenced once (ie no 
> replicates)

> The aim -- To find DE genes based on the sensitivity of tumour samples 
> to drug A [and replicate the same analysis to drug B]

> The aim (reworded for clarification) - To find DE genes between tumour 
> samples which are sensitive to drug A and tumour samples which are 
> resistant to drug A
>
> Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ;
> targets
>   files samples    drug_A    drug_B
> 1    T01     T01 resistant sensitive
> 2    T02     T02 resistant resistant
> 3    T03     T03 sensitive resistant
> 4    T04     T04    medium sensitive
> 5    T05     T05    medium sensitive
> 6    T06     T06 resistant sensitive
> 7    T07     T07    medium resistant
> 8    T08     T08    medium resistant
> 9    T09     T09 resistant resistant
> 10   T10     T10    medium sensitive
> 11   T11     T11 resistant resistant
> 12   T12     T12 sensitive resistant
> 13   T13     T13 resistant resistant
> 14   T14     T14 sensitive sensitive
> 15   T15     T15 sensitive resistant
> 16   T16     T16 sensitive sensitive
> 17   N01  normal   unknown   unknown
> 18   N02  normal   unknown   unknown
> 19   N03  normal   unknown   unknown
>
> To clarify :-
> 1)All RNA-seq was data was from untreated samples
> 2)Information on drug sensitivity was obtain from wet-lab experiments
> 3)No drug sensitivity experiments was done on normal samples, hence the unknown
>
> From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I  am unsure if nested is the better option??
>
> Therefore my design is
>
> Groups = factor(paste(targets$samples,targets$drug_A,sep="."))
> design = model.matrix(~0 + Groups)
> colnames(design) = levels(Groups)
>
> From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A).
> Therefore my question to dear mailing list members would be,
> 1) Is my experimental design correct to test my aim? (My gut feeling is it is not...)
>
> 2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible?
> Would this meta-data be the key?
> targets_2
> targets
>  files samples   type    drug_A    drug_B
> 1   T01     T01 tumour resistant sensitive
> 2   T02     T02 tumour resistant resistant
> 3   T03     T03 tumour sensitive resistant
> 4   T04     T04 tumour    medium sensitive
> .
> .
> 16   T16     T16 tumour sensitive sensitive
> 17   N01  normal normal   unknown   unknown
> 18   N02  normal normal   unknown   unknown
> 19   N03  normal normal   unknown   unknown
>
> Following the above meta-data, proceed along this line:-
> Groups = factor(paste(targets_2$type,targets_2$drug_A,sep="."))
> design = model.matrix(~0 + Groups)
> colnames(design) = levels(Groups)
>
> my.contrast = makeContrasts(
>    tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant,
>    tumour_normal.sensitiveVsresitabt = (tumour.sensitive-normal.unknown)-(tumour.resistant-normal.unknown)
>    ,levels=design)
>
> Would the method above be more appropriate?? But will it account for the 
> variability in the tumour samples? (ie- Does the design above treat the 
> tumour as replicates??)
>
> Thank you for taking time reading this post and I apologies if I included many unnecessary information.
> Zaki
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}