[R] Using the 'by' function within a 'for' loop

Tue Apr 22 21:26:21 CEST 2008

After talking about it, I forgot to put the drop=TRUE in the 'split' call:

x.index <- split(seq(nrow(dat)), dat[,c("tx","day")], drop=TRUE)
results <- lapply(x.index, function(.indx){
   mn <- mean(dat$k[.indx])
   ......
   data.frame(....)
})

On Tue, Apr 22, 2008 at 1:30 PM, Judith Flores <juryef at yahoo.com> wrote:
> Dear R experts,
>
>
>    I am sorry for sending this email again. I would
> imagine yesterday and maybe today, have been very busy
> days with the release of R v 2.7.0. I join all the R
> users who are very gratful for your contant work and
> efforts, specially knowing that you are doing this for
> the sake of science, without gettig any compensation
> for that.
>    Having written that, I decided to send the email
> below again, in case it was forgotten; or maybe I am
> missing something very basic?
>
>   I am using version 2.7.0, in windows XP.
>
> Start of yesterday's email:
>
>    I am trying to optimize my script, because right
> now it requires a lot of memory. The goal is to
> generate four plots in one page. Every plot
> corresponds to the means and sem's calculated for a
> given variable at different days. In order to obtain
> the means and sem's I apply the 'by' function. The way
> I have done it so far is like this:
>
> Read the data
> Generate a summary of the mean and sem of a variable
> at every Day.
> Plot the mean and sem of that variable.
>
> Repeat the same process for the other 3 variables.
>
>  I tried to optimize the code by using a for loop,
> the code is below.
>
>
>
> #Reading the data
> dato<-read.csv('mydata.csv')
> names(dato)<-c("id","day","tx","var1","var2","var3","var4")
> dato<-dato[,1:7]
>
> #Specify varible to be plotted
> variable<-c('var1','var2','var3','var4')
>
> #Define parameters of window where panel: margins,
> number of plots in the panel
> windows(height=9, width=9, rescale='fixed')
> par(mfrow=c(2,2),xpd=T, bty='l',
> omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
>
>
> for (k in variable) {
>
>    dat<-dato[!is.na(k),]
>
>
>
>    summ<-by(dat,dat[,c("tx","day")], function(x) {
>        mn<-mean(x$k)
>        std<-sd(x$k)
>        n<-length(x$k)
>        se<-std/sqrt(n)
>        lowb<-mn-se
>        upb<-mn+se
>
> data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
>        })
>    summ<-do.call("rbind",summ)
>
>
>
>
>    #Definining x axis range
>    xmax<-unique(max(summ$day,na.rm=TRUE))
>    xmin<-unique(min(summ$day,na.rm=TRUE))
>
>    yaxmin<-unique(min(summ$lowb))
>    yaxmax<-unique(max(summ$upb))
>
>
> plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
> ylab=k,
>
> las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
>        points(summ$day,summ$mn)
>
> }
>
>
>
>
>    Where variable is a vector that specifies all the
> variables I want to plot.
>
> But I am getting the following error:
>
> "Error in var(as.vector(x), na.rm = na.rm) : 'x' is
> empty
> In addition: Warning message:
> In mean.default(x$k) : argument is not numeric or
> logical: returning NA"
>
>   Could some one please show me how to structure my
> code to achieve my final goal, which is to simplify
> it?
>
> I am attaching a csv file in case you want to run my
> code.
>
> Thank you very much in advance for your time and help,
>
> Judith
>
>
>
>
>
>      ____________________________________________________________________________________
> Be a better friend, newshound, and
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?