[R] multiple csv files for T-test

arun smartpink111 at yahoo.com
Fri Jun 28 14:57:01 CEST 2013


HI,
According to ?t.test() documentation
If ‘paired’ is ‘TRUE’ then both ‘x’ and ‘y’ must be specified and
     they must be the same length.  Missing values are silently removed
     (in pairs if ‘paired’ is ‘TRUE’)

#Example with missing values
set.seed(24)
dat1<- as.data.frame(matrix(sample(c(NA,20:40),40,replace=TRUE),ncol=4))
set.seed(285)
dat2<- as.data.frame(matrix(sample(c(NA,35:60),40,replace=TRUE),ncol=4)) 

 sapply(colnames(dat1),function(i) t.test(dat1[,i],dat2[,i],paired=TRUE)$p.value) 
#          V1           V2           V3           V4 
#7.004488e-05 1.374986e-03 6.666004e-04 3.749257e-04 


#Removing missing values and then do the test
sapply(colnames(dat1),function(i) {x1<-na.omit(cbind(dat1[,i],dat2[,i]));t.test(x1[,1],x1[,2],paired=TRUE)$p.value}) 
#          V1           V2           V3           V4 
#7.004488e-05 1.374986e-03 6.666004e-04 3.749257e-04 

A.K.




thanks very much, you're help is much appreciated. 

Just another small question, what's the best way to deal with missing data?  If i want to do a paired t-test? 


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, June 27, 2013 1:47 PM
Subject: Re: multiple csv files for T-test

Hi,
I used as.data.frame(matrix(...)) just to create an example dataset.  In your case, you don't need to do that.  Using the same example:

set.seed(24)
dat1<- as.data.frame(matrix(sample(20:40,40,replace=TRUE),ncol=4))
set.seed(285)
dat2<- as.data.frame(matrix(sample(35:60,40,replace=TRUE),ncol=4))

write.csv(dat1,"file1.csv",row.names=FALSE)
write.csv(dat2,"file2.csv",row.names=FALSE)
data1<- read.csv("file1.csv")
data2<- read.csv("file2.csv")

###Your code:
 dat1New<- as.data.frame(matrix(data1))  
 dat2New<- as.data.frame(matrix(data2)) 
###It is always useful to check ?str() 


str(dat1New)
#'data.frame':    4 obs. of  1 variable:
# $ V1:List of 4
 # ..$ : int  26 24 34 30 33 39 25 36 36 25
  #..$ : int  32 27 34 34 26 38 24 20 30 22
 # ..$ : int  21 31 35 22 24 34 21 32 33 20
 #..$ : int  26 25 27 23 39 24 35 33 34 40



 dat1New
#                                      V1
#1 26, 24, 34, 30, 33, 39, 25, 36, 36, 25
#2 32, 27, 34, 34, 26, 38, 24, 20, 30, 22
#3 21, 31, 35, 22, 24, 34, 21, 32, 33, 20
#4 26, 25, 27, 23, 39, 24, 35, 33, 34, 40
 dat2New
#                                      V1
#1 53, 40, 47, 57, 57, 53, 35, 42, 53, 41
#2 54, 37, 43, 40, 57, 42, 37, 53, 60, 39
#3 54, 60, 46, 50, 35, 41, 58, 45, 36, 53
#4 52, 56, 44, 40, 38, 53, 47, 46, 60, 50
 sapply(colnames(dat1New),function(i) t.test(dat1New[,i],dat2New[,i],paired=TRUE)$p.value) 
#Error in x - y : non-numeric argument to binary operator


##Just using data1 and data2

sapply(colnames(data1),function(i) t.test(data1[,i],data2[,i],paired=TRUE)$p.value) 
#          V1           V2           V3           V4 
#3.202629e-05 6.510644e-04 6.215225e-04 3.044760e-04 


#or using dat1New and dat2New
sapply(seq_along(dat1New$V1),function(i) t.test(dat1New$V1[[i]],dat2New$V1[[i]],paired=TRUE)$p.value)
#[1] 3.202629e-05 6.510644e-04 6.215225e-04 3.044760e-04



A.K.



thanks for the reply, I am getting the following error 
Error in x - y : non-numeric argument to binary operator 

This is what I enter below 

> data1 <-read.csv("file1.csv") 
> data2 <-read.csv("file2.csv") 
> dat1<- as.data.frame(matrix(data1)) 
> dat2<- as.data.frame(matrix(data2)) 
> sapply(colnames(dat1),function(i) t.test(dat1[,i],dat2[,i],paired=TRUE)$p.value) 

As far as I can see all my values are numeric...? 


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, June 27, 2013 10:17 AM
Subject: Re: multiple csv files for T-test

Hi,
May be this helps:
#You can use ?read.csv() to read the two files.

set.seed(24)
dat1<- as.data.frame(matrix(sample(20:40,40,replace=TRUE),ncol=4))
set.seed(285)
dat2<- as.data.frame(matrix(sample(35:60,40,replace=TRUE),ncol=4))
sapply(colnames(dat1),function(i) t.test(dat1[,i],dat2[,i],paired=TRUE)$p.value)
#          V1           V2           V3           V4 
#3.202629e-05 6.510644e-04 6.215225e-04 3.044760e-04 

A.K.

Hi 
I am fairly new to R so if this is a stupid question please forgive me. 

I have a CSV file with multiple parameters (50).  I have another
CSV file with the same parameters after treatment.  Is there a way I 
can read these two files into R and do multiple paired T-test as all the
parameters are in the same columns in each file? 

Thanks in advance



More information about the R-help mailing list