[BioC] differntially expressed gene

Wed Mar 9 22:57:42 CET 2011

Hi Prasad,

On 3/9/2011 11:56 AM, Prasad Siddavatam wrote:
> Hi,
>
> I sincerely thank Dr.Brown for his response to my earlier post on design matrix.
>
> I have two questions....1. about the discrepancies in my DE results
>                          2. B-values
> These are my target files
> FileName	Cy3	Cy5	Original Names	
> HIDEN_1.gpr	Ref	HI_Inf	Heat_Inactivated_1	
> HIDEN_2.gpr	Ref	HI_Inf	Heat_Inactivated_2	
> HIDEN_3.gpr	Ref	HI_Inf	Heat_Inactivated_3	
> infected_1.gpr	Ref	Infect	Live_Infection_1	
> infected_2.gpr	Ref	Infect	Live_Infection_2	
> infected_3.gpr	Ref	Infect	Live_Infection_3
>
> design:
>      HI_Inf Infect
>        1      0
>        1      0
>        1      0
>        0      1
>        0      1
>        0      1
> contrast:
>          Contrasts
> Levels   HI_INF INF INFvsHI_INF
>    HI_Inf      1   0          -1
>    Infect      0   1           1
> When I used the above matrix and contrasts, I found 270 and 2484 DE genes (for
> HI_INF and INF, respectively).
>
> But when I divided the data into two separate analyses
> 1. For HI_INF I found 608 DE genes
> HIDEN_1.gpr	Ref	HI_Inf	Heat_Inactivated_1	
> HIDEN_2.gpr	Ref	HI_Inf	Heat_Inactivated_2	
> HIDEN_3.gpr	Ref	HI_Inf	Heat_Inactivated_3
> 2. For INF I found 868 DE genes
> infected_1.gpr	Ref	Infect	Live_Infection_1	
> infected_2.gpr	Ref	Infect	Live_Infection_2	
> infected_3.gpr	Ref	Infect	Live_Infection_3
>
> Why is this difference? technically those should be same because rest of the
> steps were similar between the two.

Actually they shouldn't be the same. This has to do with the denominator 
of the t-statistic you are computing. Recall that the denominator of a 
t-statistic acts as a 'yardstick', allowing us to determine if a given 
difference in means is larger than expected under the null.

In the first case above, the denominator is computed using all six 
arrays (if doing a conventional ANOVA, this is the sums of squares for 
error or SSE).

In the second case, the denominator is computed using just the three 
arrays under consideration (the standard error of the mean or SEM). 
Because there are fewer arrays, this estimator will have fewer degrees 
of freedom, and hence will be less powerful.

As for why the difference, I wonder if the live_infection arrays are 
much noiser than the heat_inactivated arrays. This could explain why you 
see the varying number of significant genes.

Best,

Jim

> ------------------------------------------------------------------------
> I also found that there are some genes with negative "B Values" but<  0.5
> adjusted p.values and p.values...see below
>     logFC         t     P.Value     adj.P.Val    B
>   -0.6740520 -3.655211 0.006576695 0.04978582 -2.453619
>   -0.3866386 -3.655013 0.006578564 0.04978582 -2.453912
>   -0.6554844 -3.652845 0.006599049 0.04992410 -2.457116
>
> In this case, can I delete the genes with negative B-values though adjusted
> p.values and p.values are<  0.05?
>
> Your suggestions are highly appreciated
> -regards
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues