[R] mean fold change issues and p values

David Winsemius dwinsemius at comcast.net
Thu Apr 16 04:47:26 CEST 2009


I am not quite sure what you are trying to do because the phrase "mean  
fold change values" is not in my experience. I am guessing that it is  
referring to some sort of geometric mean or multiplicative model. The  
lengthy code looks very much like a rendition of a Fortran procedure  
rather than typical R code.  I am wondering if you might get better  
help if you could try to express you needs in the language of general  
linear models, ofr at least be a bit more expansive in the description  
of the desired process rather than giving us a series of obscure, un-R- 
like, uncommented nested loops and asking us to fix them.

-- 
David Winsemius


On Apr 14, 2009, at 12:50 PM, Weber, Jeffrey D wrote:

> I am new to R and have two scripts written slightly different but  
> should to relatively the same thing but my lack of experience with  
> the program I can not figure out the what I need to do to correct  
> it.  The first script gives me a consistent mean fold change values  
> with every run but can generate negative p values for some.  For the  
> second version of the script, the fold changes seem to be very  
> random but the p values do not go negative.   I have highlighted the  
> differences in the script that I noticed.  I am think some  
> combination of these two scripts will work but I am not sure.
>
>
> Version 1
>
> ##t test
>
> len_sig_test = length(sig_test)
>
> if(len_sig_test%%2 > 0)
>
>                message("Number of groups for t-test is not paired.")
>
> if(max(sig_test) > group)
>
>                message("Group number for t-test is out of range.")
>
>
>
> data.norm.test = data.norm.zo                 #normalized data
>
> nrow <- nrow(data.norm.test)
>
> ncol <- ncol(data.norm.test)
>
>
>
> p.ttest = rep(NA,  
> nrow)                                                #parametric  
> test -- t test
>
> p.wilcox = rep(NA, nrow)                             #non-parametric  
> test -- Mann-Whitney test
>
> fold.mean <- rep(NA, nrow)
>
> mean.na <- rep(NA, nrow)
>
> mean.nb <- rep(NA, nrow)
>
>
>
> sig.peak.all = data.norm.test
>
> g_1 = sig_test[1]
>
> g_2 = sig_test[2]
>
>
>
> col_s1 = 1; col_s2 = 1;
>
> for(i in 1:group){
>
>                if(i < g_1)
>
>                                col_s1 = col_s1+group_size[i]
>
>                if(i < g_2)
>
>                                col_s2 = col_s2+group_size[i]
>
> }
>
> col_e1 = col_s1+group_size[g_1]-1
>
> col_e2 = col_s2+group_size[g_2]-1
>
>
>
> for(i in 1:nrow){
>
>                na.int.norm <- rep(0, group_size[g_1])
>
>                nb.int.norm <- rep(0, group_size[g_2])
>
>                m = 1
>
>                for(j in col_s1:col_e1){
>
>                                na.int.norm[m] <- data.norm.test[i,j]
>
>                                m <- m+1
>
>                }
>
>                 n = 1
>
>                for(j in col_s2:col_e2){
>
>                                nb.int.norm[n] <- data.norm.test[i,j]
>
>                                n <- n+1
>
>                }
>
>                 m_na = mean(na.int.norm, na.rm=TRUE)
>
>                m_nb = mean(nb.int.norm, na.rm=TRUE)
>
>                mean.na[i] = exp(m_na)
>
>                mean.nb[i] = exp(m_nb)
>
>                if(group_size[g_1]-sum(is.na(na.int.norm))>1 &&  
> group_size[g_2]-sum(is.na(nb.int.norm))>1){
>
>                                fold.mean[i] = mean.na[i]/mean.nb[i]
>
>                                p.ttest[i] = t.test(na.int.norm,  
> nb.int.norm)$p.value
>
>                                p.wilcox[i] =   
> wilcox.test(na.int.norm, nb.int.norm)$p.value
>
>                }
>
>                else if(!is.na(m_na) && !is.na(m_nb)) 
> {                  #peak present in single sample
>
>                                fold.mean[i] = mean.na[i]/mean.nb[i]
>
>                                p.ttest[i] = -1
>
>                                p.wilcox[i] =  -1
>
>                }
>
>                else{
>
>                                if(is.na(m_na)) 
> {                                                #all data are NA in  
> a group
>
>                                                fold.mean[i] = 0.1 +  
> rnorm(1, mean=0, sd=0.04)
>
>                                                p.ttest[i] =  
> p.wilcox[i] = 0.001 + rnorm(1, mean=0, sd=0.002)
>
>                                }else if(is.na(m_nb)){
>
>                                                fold.mean[i] = 10 +  
> rnorm(1, mean=0, sd= 4)
>
>                                                p.ttest[i] =  
> p.wilcox[i] = 0.001 + rnorm(1, mean=0, sd=0.002)
>
> }
>
> }
>
> }
>
> sig.peak.t <- rep(0, nrow)
>
> sig.peak.w <- rep(0, nrow)
>
>
>
> n.ttest = n.mann = 0
>
> for(i in 1:nrow){
>
>                if(!is.na(p.ttest[i]) & p.ttest[i]<=0.05){
>
>                                n.ttest <- n.ttest+1
>
>                                sig.peak.t[i] = 1
>
>                }
>
>                if(!is.na(p.wilcox[i]) & p.wilcox[i]<=0.05){
>
>                                n.mann <- n.mann+1
>
>                                sig.peak.w[i] = 1
>
> }
>
> }
>
> n.ttest; nrow; n.ttest/nrow*100
>
> n.mann; nrow; n.mann/nrow*100
>
>
>
> par(mfrow=c(1,1), oma=c(0,0,3,0), mar=c(5,4,4,2))          #t-test
>
> plot(log(fold.mean), -log(p.ttest), xlab="fold change (log)",  
> ylab="p (-log)", col="red")
>
> points(log(fold.mean), -log(rep(0.05, nrow)), type='l', col="blue")
>
> Version 1
>
> ##t test
>
> len_sig_test = length(sig_test)
>
> if(len_sig_test%%2 > 0)
>
>                message("Number of groups for t-test is not paired.")
>
> if(max(sig_test) > group)
>
>                message("Group number for t-test is out of range.")
>
>
>
> data.norm.test = data.norm.zo                 #normalized data
>
> nrow <- nrow(data.norm.test)
>
> ncol <- ncol(data.norm.test)
>
>
>
> p.ttest = rep(NA,  
> nrow)                                                #parametric  
> test -- t test
>
> p.wilcox = rep(NA, nrow)                             #non-parametric  
> test -- Mann-Whitney test
>
> fold.mean <- rep(NA, nrow)
>
> mean.na <- rep(NA, nrow)
>
> mean.nb <- rep(NA, nrow)
>
>
>
> sig.peak.all = data.norm.test
>
> g_1 = sig_test[1]
>
> g_2 = sig_test[2]
>
>
>
> col_s1 = 1; col_s2 = 1;
>
> for(i in 1:group){
>
>                if(i < g_1)
>
>                                col_s1 = col_s1+group_size[i]
>
>                if(i < g_2)
>
>                                col_s2 = col_s2+group_size[i]
>
> }
>
> col_e1 = col_s1+group_size[g_1]-1
>
> col_e2 = col_s2+group_size[g_2]-1
>
> for(i in 1:nrow){
>
>                na.int.norm <- rep(0, group_size[g_1])
>
>                nb.int.norm <- rep(0, group_size[g_2])
>
>                 m = 1
>
>                for(j in col_s1:col_e1){
>
>                                na.int.norm[m] <- data.norm.test[i,j]
>
>                                m <- m+1
>
>                }
>
>                 n = 1
>
>                for(j in col_s2:col_e2){
>
>                                nb.int.norm[n] <- data.norm.test[i,j]
>
>                                n <- n+1
>
>                }
>
>
>
>                m_na = mean(na.int.norm, na.rm=TRUE)
>
>                m_nb = mean(nb.int.norm, na.rm=TRUE)
>
>                mean.na[i] = exp(m_na)
>
>                mean.nb[i] = exp(m_nb)
>
>                if(group_size[g_1]-sum(is.na(na.int.norm))>1 &&  
> group_size[g_2]-sum(is.na(nb.int.norm))>1){
>
>                                fold.mean[i] = mean.na[i]/mean.nb[i]
>
>                                p.ttest[i] = t.test(na.int.norm,  
> nb.int.norm)$p.value
>
>                                p.wilcox[i] =   
> wilcox.test(na.int.norm, nb.int.norm)$p.value
>
>                }
>
>                else if(!is.na(m_na) && !is.na(m_nb)) 
> {                  #peak present in single sample
>
>                                fold.mean[i] = mean.na[i]/mean.nb[i]
>
>                                p.ttest[i] = -1
>
>                                p.wilcox[i] =  -1
>
>                }
>
>                else{
>
>                                if(is.na(m_na)) 
> {                                                #all data are NA in  
> a group
>
>                                                fold.mean[i] = 0.1 +  
> rnorm(1, mean=0, sd=0.04)
>
>                                                p.ttest[i] =  
> p.wilcox[i] = 0.001 + rnorm(1, mean=0, sd=0.002)
>
>                                }else if(is.na(m_nb)){
>
>                                                fold.mean[i] = 10 +  
> rnorm(1, mean=0, sd= 4)
>
>                                                p.ttest[i] =  
> p.wilcox[i] = 0.001 + rnorm(1, mean=0, sd=0.002)
>
> }
>
> }
>
> }
>
> sig.peak.t <- rep(0, nrow)
>
> sig.peak.w <- rep(0, nrow)
>
>
>
> n.ttest = n.mann = 0
>
> for(i in 1:nrow){
>
>                if(!is.na(p.ttest[i]) & p.ttest[i]<=0.05){
>
>                                n.ttest <- n.ttest+1
>
>                                sig.peak.t[i] = 1
>
>                }
>
>                if(!is.na(p.wilcox[i]) & p.wilcox[i]<=0.05){
>
>                                n.mann <- n.mann+1
>
>                                sig.peak.w[i] = 1
>
> }
>
> }
>
> n.ttest; nrow; n.ttest/nrow*100
>
> n.mann; nrow; n.mann/nrow*100
>
>
>
> par(mfrow=c(1,1), oma=c(0,0,3,0), mar=c(5,4,4,2))          #t-test
>
> plot(log(fold.mean), -log(p.ttest), xlab="fold change (log)",  
> ylab="p (-log)", col="red")
>
> points(log(fold.mean), -log(rep(0.05, nrow)), type='l', col="blue")
>
>
>
> Version 2
>
> ##t test
>
> len_sig_test = length(sig_test)
>
> if(len_sig_test%%2 > 0)
>
>                message("Number of groups for t-test is not paired.")
>
> if(max(sig_test) > group)
>
>                message("Group number for t-test is out of range.")
>
>
>
> data.norm.test = data.norm.zo                 #normalized data
>
> nrow <- nrow(data.norm.test)
>
> ncol <- ncol(data.norm.test)
>
> for(w in 1:nrow){
>
>                for(z in 1:ncol){
>
>                                                 
> if(is.na(data.norm.test[w,z])){
>
>                                                }else  
> if(exp(data.norm.test[w,z])==0) {data.norm.test[w,z]=NA
>
> }
>
> }
>
> }
>
> p.ttest = rep(NA,  
> nrow)                                                #parametric  
> test -- t test
>
> p.wilcox = rep(NA, nrow)                             #non-parametric  
> test -- Mann-Whitney test
>
> fold.mean <- rep(NA, nrow)
>
> mean.na <- rep(NA, nrow)
>
> mean.nb <- rep(NA, nrow)
>
>
>
> sig.peak.all = data.norm.test
>
> g_1 = sig_test[1]
>
> g_2 = sig_test[2]
>
>
>
> col_s1 = 1; col_s2 = 1;
>
> for(i in 1:group){
>
>                if(i < g_1)
>
>                                col_s1 = col_s1+group_size[i]
>
>                if(i < g_2)
>
>                                col_s2 = col_s2+group_size[i]
>
> }
>
> col_e1 = col_s1+group_size[g_1]-1
>
> col_e2 = col_s2+group_size[g_2]-1
>
>
>
> for(i in 1:nrow){
>
>                na.int.norm <- rep(0, group_size[g_1])
>
>                nb.int.norm <- rep(0, group_size[g_2])
>
>                 m = 1
>
>                x=0
>
>                for(j in col_s1:col_e1){
>
>                                na.int.norm[m] <- data.norm.test[i,j]
>
>                                if(!is.na(na.int.norm[m])) x<-x+1
>
>                                m <- m+1
>
>                }
>
>                n = 1
>
>                y=0
>
>                for(j in col_s2:col_e2){
>
>                                nb.int.norm[n] <- data.norm.test[i,j]
>
>                                if(!is.na(nb.int.norm[n])) y<-y+1
>
>                                n <- n+1
>
>                }
>
>
>
>                m_na = mean(exp(na.int.norm), na.rm=TRUE)
>
>                m_nb = mean(exp(nb.int.norm), na.rm=TRUE)
>
>                mean.na[i] = m_na
>
>                mean.nb[i] = m_nb
>
>                if(group_size[g_1]-sum(is.na(na.int.norm))>1 &&  
> group_size[g_2]-sum(is.na(nb.int.norm))>1&& x!=1 && y!=1 && x !=0 &&  
> y !=0){
>
>                                fold.mean[i] = mean.na[i]/mean.nb[i]
>
>                                p.ttest[i] = t.test(exp(na.int.norm),  
> exp(nb.int.norm))$p.value
>
>                                p.wilcox[i] =  
> wilcox.test(exp(na.int.norm), exp(nb.int.norm))$p.value
>
>                }else{
>
>                                                if(is.na(m_na)) 
> {                                                #all data are NA in  
> a group
>
>                                                fold.mean[i] =  
> max(0.1, (0.1 + rnorm(1, mean=0, sd=0.1)))
>
>                                              p.ttest[i] =  
> p.wilcox[i] = 0.0001 + abs(rnorm(1, mean=0, sd=0.0001))
>
>                                                }else if(is.na(m_nb)){
>
>                                              fold.mean[i] = 10 +  
> rnorm(1, mean=0, sd=1)
>
>                                             p.ttest[i] = p.wilcox[i]  
> = 0.0001 + abs(rnorm(1, mean=0, sd=0.0001))
>
>                                             }else if( x==1)  
> fold.mean[i] = max(0.1, (0.1 + rnorm(1, mean=0, sd=0.1)))
>
>    p.ttest[i] = p.wilcox[i] = 0.0001 + abs(rnorm(1, mean=0,  
> sd=0.0001))
>
>                                                }else if(y==1){
>
>                                                                 
> fold.mean[i] = 10 + rnorm(1, mean=0, sd=1)
>
>                                                                 
> p.ttest[i] = p.wilcox[i] = 0.0001 + abs(rnorm(1, mean=0, sd=0.0001))
>
>                                                                }
>
>                                }
>
> }
>
> sig.peak.t <- rep(0, nrow)
>
> sig.peak.w <- rep(0, nrow)
>
>
>
> n.ttest = n.mann = 0
>
> for(i in 1:nrow){
>
>                if(!is.na(p.ttest[i]) & p.ttest[i]<=0.05){
>
>                                n.ttest <- n.ttest+1
>
>                                sig.peak.t[i] = 1
>
>                }
>
>                if(!is.na(p.wilcox[i]) & p.wilcox[i]<=0.05){
>
>                                n.mann <- n.mann+1
>
>                                sig.peak.w[i] = 1
>
>                }
>
> }
>
> n.ttest; nrow; n.ttest/nrow*100
>
> n.mann; nrow; n.mann/nrow*100
>
> par(mfrow=c(1,1), oma=c(0,0,3,0), mar=c(5,4,4,2))          #t-test
>
> plot(log(fold.mean), -log(p.ttest), xlab="fold change (log)",  
> ylab="p (-log)", col="red")
>
> points(log(fold.mean), -log(rep(0.05, nrow)), type='l', col="blue")
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list