[R] permutation approach to know effect of one factor (factor A) to another (factor B) in each level of factor A (although it is slow)
Kazunari Nozue
knozue at ucdavis.edu
Fri Feb 24 19:15:01 CET 2012
Hi all,
At first I will explain my linear mixed effects model;
lme1 <- lme(leaf_length ~ Treatment*Genotype*leaf,random=~Treatment|Set,data=data)
or
lmer1 <- lmer(leaf~Treatment*Genotype*leaf+(Treatment|Set),data=data)
Treatment factor has two conditions: mock (=Mo) and treatment(=Tr)
Leaf is position of leaf: 3rd leaf, 4th leaf, ....
Genotype is type of plant: wt, mut1, mut2, ....., mut72
Set is experimental replicates: A to J.
I would like to ask how to compare leaf length under condition A and one under condition B in each Genotype, i.e. know effect of one factor (factor A, “Genotype”) to another (factor B, “Treatment’) in each level of factor A (eg. “wt”, “mut1”, “mut2”, ...).
Since my data is unbalanced data, I could not use TukeyHSD() multiple comparison (see its help).
My understanding is that pvalue in summary(lme1)$tTable is controversial and pvals.fnc(lmer1) # in languageR package
gave me an error (due to (Treatment|Set) because (1|Set) did not gave me an error, but (Treatment\SET) is essential in this model.).
As seen below, I would like to know my permutation method is OK or not.
I started from a simple permutation approach from Maindonald and Braun (2003) "Data Analysis and Graphics Using R." pg 98. (see scripts below) and I combined my mixed model and this approach with simulated data (omitting leaf factor to simplify, see scripts below). Disadvantage of this method is running time could be long in large data sets with many permutation (eg. 10000), I would like to know my method is OK. If I could have better methods (probably by using multcomp package, such as glht() function), I would really appreciate them.
Sorry for long message.
Thank you,
Kazu
##############################
### Maindonald nad Braun pg. 98
##############################
library(DAAG)
data(two65) # from DAAG package
x1 <- two65$ambient;x2<-two65$heated;x<-c(x1,x2)
n1<-length(x1);n2<-length(x2);n<-n1+n2
dbar<-mean(x2) - mean(x1)
z<-array(,2000)
for(i in 1:2000) {
mn<-sample(n,n2,replace=FALSE)
dbardash<-mean(x[mn])-mean(x[-mn])
z[i]<-dbardash
}
pval<-(sum(z > abs(dbar)) + sum(z< -abs(dbar)))/2000
plot(density(z),yaxs="i")
abline(v=dbar)
abline(v=-dbar,lty=2)
###############################
### my example with unbalanced data
###############################
library(lme4)
#simulate data. leaf length in Tr is longer than Mo. wt shows more dramatic response to Tr than mut. setA plants were longer than setB plants (set factor is significant in this model).
set.seed(1234)
data <- data.frame(Treatment=rep(c("Tr","Mo"),c(9,11)),leaf=c(rnorm(9,11),rnorm(11,10)),Set=rep(c("A","B"),times=10))
data$Genotype <- factor(rep(c("mut","wt"),each=5,length.out=20))
#add setA specific effect for shade and then for sun
data$leaf[data$Treatment=="Tr" & data$Set=="A"] <- data$leaf[data$Treatment=="Tr" & data$Set=="A"] + rnorm(length(data$leaf[data$Treatment=="Tr" & data$Set=="A"]),1)
data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"] <- data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"] + rnorm(length(data$leaf[data$Treatment=="Tr" & data$Genotype=="mut"]),-0.5)
data$leaf[data$Treatment=="Mo" & data$Set=="A"] <- data$leaf[data$Treatment=="Mo" & data$Set=="A"] + rnorm(length(data$leaf[data$Treatment=="Mo" & data$Set=="A"]),-0.25)
data
# mean from observed data
mean.table.obs<-tapply(data$leaf, list(data$Treatment,data$Genotype),mean)
# permutation
mean.table.PER<-list()
nreps<-1000
z<-list() # mean differences
for(i in 1:nreps) {
new.data<-data.frame()
new.wt<-sample(data[data$Genotype=="wt",]$leaf,sum(data$Genotype=="wt",na.rm=TRUE),replace=FALSE) # null hypothesis: leaf in Mo = Tr in wt
new.mut<-sample(data[data$Genotype=="mut",]$leaf,sum(data$Genotype=="mut",na.rm=TRUE),replace=FALSE)
new.data<-data.frame(leaf=c(new.wt,new.mut),Genotype=data$Genotype,Treatment=data$Treatment,Set=data$Set)
# mixed effect model
lmer.temp<- lmer(leaf~Treatment*Genotype+(Treatment|Set),data=new.data)
#calculate mean of each group
mean.table.PER[[i]]<-tapply(fitted(lmer.temp), list(new.data$Treatment,new.data$Genotype),mean)
z[[i]]<-mean.table.PER[[i]][2,] - mean.table.PER[[i]][1,]
}
# calculate p value
TF1<-list()
TF2<-list()
p.value<-vector()
for(i in 1:length(levels(data$Genotype))) {
for(n in 1:length(z)) {
TF1[[n]]<- (z[[n]][i] > abs(mean.table.obs[2,i] - mean.table.obs[1,i]))*1
TF2[[n]]<- (z[[n]][i] < -abs(mean.table.obs[2,i] - mean.table.obs[1,i]))*1
}
p.value[i]<-(sum(as.numeric(TF1) + as.numeric(TF2)))/length(z)
}
names(p.value)<-levels(data$Genotype)
p.value
p.adjust<-p.adjust(p=p.value,method=”fdr”)
p.adjust
# graph
par(mfcol=c(2,1))
hist(unlist(z)[names(unlist(z))=="mut"],breaks=seq(-5,5,0.2))
abline(v=abs(mean.table.obs[2,1] - mean.table.obs[1,1]))
abline(v=-abs(mean.table.obs[2,1] - mean.table.obs[1,1]))
hist(unlist(z)[names(unlist(z))=="wt"],breaks=seq(-5,5,0.2))
abline(v=abs(mean.table.obs[2,2] - mean.table.obs[1,2]))
abline(v=-abs(mean.table.obs[2,2] - mean.table.obs[1,2]))
More information about the R-help
mailing list