[R] Using the 'by' function within a 'for' loop
Judith Flores
juryef at yahoo.com
Tue Apr 22 19:30:57 CEST 2008
Dear R experts,
I am sorry for sending this email again. I would
imagine yesterday and maybe today, have been very busy
days with the release of R v 2.7.0. I join all the R
users who are very gratful for your contant work and
efforts, specially knowing that you are doing this for
the sake of science, without gettig any compensation
for that.
Having written that, I decided to send the email
below again, in case it was forgotten; or maybe I am
missing something very basic?
I am using version 2.7.0, in windows XP.
Start of yesterday's email:
I am trying to optimize my script, because right
now it requires a lot of memory. The goal is to
generate four plots in one page. Every plot
corresponds to the means and sem's calculated for a
given variable at different days. In order to obtain
the means and sem's I apply the 'by' function. The way
I have done it so far is like this:
Read the data
Generate a summary of the mean and sem of a variable
at every Day.
Plot the mean and sem of that variable.
Repeat the same process for the other 3 variables.
I tried to optimize the code by using a for loop,
the code is below.
#Reading the data
dato<-read.csv('mydata.csv')
names(dato)<-c("id","day","tx","var1","var2","var3","var4")
dato<-dato[,1:7]
#Specify varible to be plotted
variable<-c('var1','var2','var3','var4')
#Define parameters of window where panel: margins,
number of plots in the panel
windows(height=9, width=9, rescale='fixed')
par(mfrow=c(2,2),xpd=T, bty='l',
omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
for (k in variable) {
dat<-dato[!is.na(k),]
summ<-by(dat,dat[,c("tx","day")], function(x) {
mn<-mean(x$k)
std<-sd(x$k)
n<-length(x$k)
se<-std/sqrt(n)
lowb<-mn-se
upb<-mn+se
data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
})
summ<-do.call("rbind",summ)
#Definining x axis range
xmax<-unique(max(summ$day,na.rm=TRUE))
xmin<-unique(min(summ$day,na.rm=TRUE))
yaxmin<-unique(min(summ$lowb))
yaxmax<-unique(max(summ$upb))
plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
ylab=k,
las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
points(summ$day,summ$mn)
}
Where variable is a vector that specifies all the
variables I want to plot.
But I am getting the following error:
“Error in var(as.vector(x), na.rm = na.rm) : 'x' is
empty
In addition: Warning message:
In mean.default(x$k) : argument is not numeric or
logical: returning NA”
Could some one please show me how to structure my
code to achieve my final goal, which is to simplify
it?
I am attaching a csv file in case you want to run my
code.
Thank you very much in advance for your time and help,
Judith
____________________________________________________________________________________
Be a better friend, newshound, and
More information about the R-help
mailing list