[BioC] EdgeR - generating appropriate design and contrast matrix : multi-factorial experiment
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Apr 6 00:34:12 CEST 2013
Dear Zaki,
The best way forward would be for you to collaborate with a statistician
at your own institution, if you can possibly do that. edgeR provides the
capabilities to do lots of analyses, but figuring out what analyses are
appropriate for your scientific problem is another question. When I work
with biologists, it often takes months or years for us to understand all
the scientist's questions and to translate these into appropriate
statistical analyses. So there is no way that I can tell you in a few
sentences how to analyse your data appropriately.
However your stated aim "To find DE genes between tumour samples which are
sensitive to drug A and tumour samples which are resistant to drug A"
seems to have an easy answer. The drug_A column splits your samples into
four groups (resistant, medium, sensitive, unknown) and you want to
compare the resistant and sensitive groups. This is one-way layout, and
you can follow Section 3.2 of the edgeR User's Guide.
Please don't put sample IDs into a factor like:
Groups = factor(paste(targets$samples,targets$drug_A,sep="."))
This is in effect trying to treat each sample as its own group, and that
makes no sense.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
On Fri, 5 Apr 2013, Zaki Fadlullah wrote:
> Dear EdgeR developers and kind list members,
> I have a RNA-seq experiment which I would like to analyse using edgeR as
> i think it is a multi-factorial experiment .
>
> After reading the excellent EdgeR user manual as well the wealth of
> design-matrix related question in the mailing list, I am still unsure
> about what design matrix would be appropriate for my data. Therefore I
> would appreciate feedback from members of mailing list.
>
> The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are
> from different individual, all samples was sequenced once (ie no
> replicates)
> The aim -- To find DE genes based on the sensitivity of tumour samples
> to drug A [and replicate the same analysis to drug B]
> The aim (reworded for clarification) - To find DE genes between tumour
> samples which are sensitive to drug A and tumour samples which are
> resistant to drug A
>
> Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ;
> targets
> files samples drug_A drug_B
> 1 T01 T01 resistant sensitive
> 2 T02 T02 resistant resistant
> 3 T03 T03 sensitive resistant
> 4 T04 T04 medium sensitive
> 5 T05 T05 medium sensitive
> 6 T06 T06 resistant sensitive
> 7 T07 T07 medium resistant
> 8 T08 T08 medium resistant
> 9 T09 T09 resistant resistant
> 10 T10 T10 medium sensitive
> 11 T11 T11 resistant resistant
> 12 T12 T12 sensitive resistant
> 13 T13 T13 resistant resistant
> 14 T14 T14 sensitive sensitive
> 15 T15 T15 sensitive resistant
> 16 T16 T16 sensitive sensitive
> 17 N01 normal unknown unknown
> 18 N02 normal unknown unknown
> 19 N03 normal unknown unknown
>
> To clarify :-
> 1)All RNA-seq was data was from untreated samples
> 2)Information on drug sensitivity was obtain from wet-lab experiments
> 3)No drug sensitivity experiments was done on normal samples, hence the unknown
>
> From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I am unsure if nested is the better option??
>
> Therefore my design is
>
> Groups = factor(paste(targets$samples,targets$drug_A,sep="."))
> design = model.matrix(~0 + Groups)
> colnames(design) = levels(Groups)
>
> From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A).
> Therefore my question to dear mailing list members would be,
> 1) Is my experimental design correct to test my aim? (My gut feeling is it is not...)
>
> 2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible?
> Would this meta-data be the key?
> targets_2
> targets
> files samples type drug_A drug_B
> 1 T01 T01 tumour resistant sensitive
> 2 T02 T02 tumour resistant resistant
> 3 T03 T03 tumour sensitive resistant
> 4 T04 T04 tumour medium sensitive
> .
> .
> 16 T16 T16 tumour sensitive sensitive
> 17 N01 normal normal unknown unknown
> 18 N02 normal normal unknown unknown
> 19 N03 normal normal unknown unknown
>
> Following the above meta-data, proceed along this line:-
> Groups = factor(paste(targets_2$type,targets_2$drug_A,sep="."))
> design = model.matrix(~0 + Groups)
> colnames(design) = levels(Groups)
>
> my.contrast = makeContrasts(
> tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant,
> tumour_normal.sensitiveVsresitabt = (tumour.sensitive-normal.unknown)-(tumour.resistant-normal.unknown)
> ,levels=design)
>
> Would the method above be more appropriate?? But will it account for the
> variability in the tumour samples? (ie- Does the design above treat the
> tumour as replicates??)
>
> Thank you for taking time reading this post and I apologies if I included many unnecessary information.
> Zaki
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list