[BioC] differntially expressed gene
James W. MacDonald
jmacdon at med.umich.edu
Wed Mar 9 22:57:42 CET 2011
Hi Prasad,
On 3/9/2011 11:56 AM, Prasad Siddavatam wrote:
> Hi,
>
> I sincerely thank Dr.Brown for his response to my earlier post on design matrix.
>
> I have two questions....1. about the discrepancies in my DE results
> 2. B-values
> These are my target files
> FileName Cy3 Cy5 Original Names
> HIDEN_1.gpr Ref HI_Inf Heat_Inactivated_1
> HIDEN_2.gpr Ref HI_Inf Heat_Inactivated_2
> HIDEN_3.gpr Ref HI_Inf Heat_Inactivated_3
> infected_1.gpr Ref Infect Live_Infection_1
> infected_2.gpr Ref Infect Live_Infection_2
> infected_3.gpr Ref Infect Live_Infection_3
>
> design:
> HI_Inf Infect
> 1 0
> 1 0
> 1 0
> 0 1
> 0 1
> 0 1
> contrast:
> Contrasts
> Levels HI_INF INF INFvsHI_INF
> HI_Inf 1 0 -1
> Infect 0 1 1
> When I used the above matrix and contrasts, I found 270 and 2484 DE genes (for
> HI_INF and INF, respectively).
>
> But when I divided the data into two separate analyses
> 1. For HI_INF I found 608 DE genes
> HIDEN_1.gpr Ref HI_Inf Heat_Inactivated_1
> HIDEN_2.gpr Ref HI_Inf Heat_Inactivated_2
> HIDEN_3.gpr Ref HI_Inf Heat_Inactivated_3
> 2. For INF I found 868 DE genes
> infected_1.gpr Ref Infect Live_Infection_1
> infected_2.gpr Ref Infect Live_Infection_2
> infected_3.gpr Ref Infect Live_Infection_3
>
> Why is this difference? technically those should be same because rest of the
> steps were similar between the two.
Actually they shouldn't be the same. This has to do with the denominator
of the t-statistic you are computing. Recall that the denominator of a
t-statistic acts as a 'yardstick', allowing us to determine if a given
difference in means is larger than expected under the null.
In the first case above, the denominator is computed using all six
arrays (if doing a conventional ANOVA, this is the sums of squares for
error or SSE).
In the second case, the denominator is computed using just the three
arrays under consideration (the standard error of the mean or SEM).
Because there are fewer arrays, this estimator will have fewer degrees
of freedom, and hence will be less powerful.
As for why the difference, I wonder if the live_infection arrays are
much noiser than the heat_inactivated arrays. This could explain why you
see the varying number of significant genes.
Best,
Jim
> ------------------------------------------------------------------------
> I also found that there are some genes with negative "B Values" but< 0.5
> adjusted p.values and p.values...see below
> logFC t P.Value adj.P.Val B
> -0.6740520 -3.655211 0.006576695 0.04978582 -2.453619
> -0.3866386 -3.655013 0.006578564 0.04978582 -2.453912
> -0.6554844 -3.652845 0.006599049 0.04992410 -2.457116
>
> In this case, can I delete the genes with negative B-values though adjusted
> p.values and p.values are< 0.05?
>
> Your suggestions are highly appreciated
> -regards
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list