[R] Multiple Paired T test from large Data Set with multiple pairs

arun smartpink111 at yahoo.com
Thu May 2 21:40:48 CEST 2013



If you have Sites with "ALA1", "ALA2", etc....
it would be better to do:
lst1<- split(WrackMass,substr(WrackMass$Site.X.Treatment,1,4))
 res1<-lapply(lst1,function(x) do.call(cbind,lapply(x[,1:4],function(y) {Site<- gsub(".*\\d(.*)","\\1",x$Site.X.Treatment);t.test(y[Site=="A"],y[Site=="U"],paired=TRUE)$p.value })))
 res1
#$ALA1
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]          1     0.7769256        0.3351745  0.6817365
#
#$BLA1
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]  0.6482613     0.9411953        0.7927984  0.3027634
#
#$CLA1
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]   0.454294    0.02519427        0.5650988  0.2981702
A.K.

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, May 2, 2013 3:30 PM
Subject: Re: Multiple Paired T test from large Data Set with multiple pairs


Hi,
Still no data.
From your code, it looks like data is similar to this:

set.seed(25)
WrackMass<-

data.frame(Algae.Mass=sample(40:50,30,replace=TRUE),Seagrass.Mass=sample(30:70,30,replace=TRUE),Terrestrial.Mass=sample(80:100,30,replace=TRUE),Other.Mass=sample(40:60,30,replace=TRUE),Site.X.Treatment=rep(c("ALA1A","ALA1U","BLA1A","BLA1U","CLA1A","CLA1U"),each=5),stringsAsFactors=FALSE)
lst1<-split(WrackMass,substr(WrackMass$Site.X.Treatment,1,3))

res<-
lapply(lst1,function(x) do.call(cbind,lapply(x[,1:4],function(y) 
{Site<-gsub(".*(\\d.*)","\\1",x$Site.X.Treatment); 
t.test(y[Site=="1A"],y[Site=="1U"],paired=TRUE)$p.value } ) ))
 res
#$ALA
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]          1     0.7769256        0.3351745  0.6817365#
#
#$BLA
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]  0.6482613     0.9411953        0.7927984  0.3027634
#
#$CLA
 #    Algae.Mass Seagrass.Mass Terrestrial.Mass Other.Mass
#[1,]   0.454294    0.02519427        0.5650988  0.2981702


A.K.


>Hello all, 
>
>I have made some progress with the following code 
>
>lapply(WrackMass[,8:11], function(x) 
t.test(x[WrackMass$Site.X.Treatment=="ALA1A"],x[WrackMass$Site.X.Treatment=="ALA1U"])$p.value) 
>
>this performs the paired t test for all four of my metrics 
(WrackMass[,8:11]) for a defined site pair (ALA1A/ALA1U).  My goal now 
is repeat the >code so that is runs for all my pairs (28 of them).  All 
the pairs are named XXX1A paired with XXX1U (A and U are treatments). 
 As explained >above the pair names are broken down in separate columns 
and concatenated in the Site.X.Treatment column. See below 
>
> Site            Treatment    Site.X.Treatment   
>ALA1          A                       ALA1A     
>ALA1          U                      ALA1U 
>
>for all pairs 
>
>I guess my question now is how can I tell R that I want a 
certain site (ALA1) paired by the treatment A or U for all sites.   
Maybe something like 
>t.test(x[WrackMass$Site=="ALA1" and Treatment=="A"],x[WrackMass$Site=="ALA" and Treatment=="U"])$p.value) 
>
>Ideas? 


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Thursday, May 2, 2013 1:58 PM
Subject: Re: Multiple Paired T test from large Data Set with multiple pairs

My code was based on the assumption that your dataset was similar to the one I provided.  Please provide an example dataset (use dput(head(dataset),20))

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
A.K.


>Arun, 
>
>I have tried applying your suggestions to my data set but I 
cannot get it to work.  I think my lack of R skills may be a 
contributing factor.  I will keep >trying though. 



----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, May 1, 2013 5:43 PM
Subject: Re: Multiple Paired T test from large Data Set with multiple pairs

Hi,
Assuming that your dataset is similar to the one below:
set.seed(25)
dat1<- data.frame(Algae.Mass=sample(40:50,10,replace=TRUE),Seagrass.Mass=sample(30:70,10,replace=TRUE),Terrestrial.Mass=sample(80:100,10,replace=TRUE),Other.Mass=sample(40:60,10,replace=TRUE),Site.X.Treatment=rep(c("ALA1A","ALA1U"),each=5),stringsAsFactors=FALSE)
library(reshape2)
dat2<-melt(dat1,id.var="Site.X.Treatment")

sapply(split(dat2,dat2$variable),function(x) t.test(x[x$Site.X.Treatment=="ALA1A",3],x[x$Site.X.Treatment=="ALA1U",3],paired=TRUE)$p.value)
  #    Algae.Mass    Seagrass.Mass Terrestrial.Mass       Other.Mass 
  #     1.0000000        0.4624989        0.4388211        0.7521036 
#or
library(plyr)
 ddply(dat2,.(variable),function(x) summarize(x,Pvalue=t.test(value~Site.X.Treatment,data=x,na.rm=TRUE,paired=TRUE)$p.value))
#          variable    Pvalue
#1       Algae.Mass 1.0000000
#2    Seagrass.Mass 0.4624989
#3 Terrestrial.Mass 0.4388211
#4       Other.Mass 0.7521036


A.K.


>Hey, 
>
>I have a fairly large data set with multiple pairs of Sites. 
 Each site has two levels (the pairs) "A" and "U".  For each pair I want
to do a paired t test of >4 different metrics that exist as columns in 
my data set. 
>
>Here is the long version 
>
>t.test(Algae.Mass[Site.X.Treatment=="ALA1A"],Algae.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Seagrass.Mass[Site.X.Treatment=="ALA1A"],Seagrass.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Terrestrial.Mass[Site.X.Treatment=="ALA1A"],Terrestrial.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Other.Mass[Site.X.Treatment=="ALA1A"],Other.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>
>How can I do this in one line of code?  I have tried lapply, 
tapply etc but keep running into issues.  It would also be great to not 
have to keep defining >"Site.X.Treatment".  I do have Site.X.Treatment 
broken down by just Site and Treatment in separate columns in the data 
set.  Any Ideas??




More information about the R-help mailing list