[BioC] (EdgeR) statistical justification of partitioning dataset for multiple analysis
Ryan
rct at thompsonclan.org
Thu Jan 30 20:03:00 CET 2014
Hi Adriaan,
If I understand correctly, you have 3 different treatments, i.e.
control, treatment 1, and treatment 2, and you have fit the same model
formula to the full dataset as well as all 3 combinations of only 2
treatments, and you are getting significantly different results between
the 3-treatment fit and the 2-treatment fits. I think the first thing
you need to do is to look at the result of plotBCV for each analysis. It
is possible that one of your treatments has significantly more
biological variability across all genes than the others. edgeR assumes
that each gene has the same BCV across all conditions, so that it can
more robustly estimate a single dispersion value for each gene. So look
at the plotBCV output from all your analyses, and see if the BCV
estimates differ significantly. This would surely explain what you are
seeing. You may also want to estimate dispersions from each treatment
group individually (drop Treatment from the model formula in this case).
The tagwise dispersions will not be very robust in this case, but the
trend and common dispersions can help you figure out which treatment has
the most biological variability.
If the dispersion estimates don't explain your differing p-values, ask
back here and maybe someone else will have another idea.
Good luck,
-Ryan
On 1/30/14, 9:43 AM, Adriaan Sticker wrote:
> Dear all,
>
> I'm doing analysis on allready mapped reads from sequencing data for
> differential expression with EdgeR. My experimental setup is as follow:
> I have samples from 4 different subjects. Material of each subject wast
> treated with 2 different treatments (and a control) for 2 timepoints.
>
> I want to analyse the effect of the treatments (compared to control and
> compared to eachother)
>
> In EdgeR I used following design
> model.matrix(~ subject+ Treatment + Time +Treatment : Time)
>
> I considered 2 strategies to analye te data:
>
> Estimate parameters from above mentioned design with all data (all samples)
> and use different contrasts to get the differential expressed genes I want.
>
> OR
>
> Use only the samples of the two treatments (eg. control vs treatment1,
> treatment 1 vs treatment 2) I want to compare to fit the parameters. Repeat
> the previous 3 times till I have compared all 3 treatments with eachother.
> So exctually 3 different analysis using only a subset (2/3 th) of the data.
>
> I noticed that I could find considerably more significant differential
> expressed genes between 2 treatments with the last approach. But I wondered
> how correct this approach is? Will I have for example problems with
> multiple testing? (I control each analysis on fdr 5% with bejamin Hochberg)
>
> thanks in advance
> Kind regard
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list