[R] calculating p-values of columns in a dataframe

Sun Jul 8 13:03:13 CEST 2007


Thomas Pujol wrote:
> I have a dataframe ("mydf") that contains "differences of means".
> I wish to test whether these differences are significantly different from zero.
> 
> Below, I calculate the t-statistic for each column.
> 
> What is a "good" method to calculate/look-up the p-value for each column?
> 
> 
> mydf=data.frame(a=c(1,-22,3,-4),b=c(5,-6,-7,9))
> 
> mymean=mean(mydf)
> mysd=sd(mydf)
> mynn=sapply(mydf, function(x) {sum ( as.numeric(x) >= -Inf) })
> myse=mysd/sqrt(mynn)
> myt=mymean/myse
> myt

You can do the whole lot with
   L <- lapply(mydf, t.test)
or if you only want the t statistics and p-values now:
   sapply(L, "[", c("statistic", "p.value"))

If you want to follow your initial approach quickly, you can calculate 
the probability function of the t distribution with 3 degrees of freedom 
(for your data) with
   2 * pt(-abs(myt), df = nrow(mydf) - 1)

Uwe Ligges


> 
>  
> ---------------------------------
> Food fight? Enjoy some healthy debate
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.