[R] Using the 'by' function within a 'for' loop

hadley wickham h.wickham at gmail.com
Tue Apr 22 21:03:17 CEST 2008


Hi Judith,

Could you provide a copy of your data as well?  (Either as a csv file,
or by copying and pasting the output of dput(my.data.frame) or by
generating a data.frame of random numbers with the same structure as
your data).  That will help people to see what your code does and
suggest improvements.

Hadley



On Tue, Apr 22, 2008 at 12:30 PM, Judith Flores <juryef at yahoo.com> wrote:
> Dear R experts,
>
>
>     I am sorry for sending this email again. I would
>  imagine yesterday and maybe today, have been very busy
>  days with the release of R v 2.7.0. I join all the R
>  users who are very gratful for your contant work and
>  efforts, specially knowing that you are doing this for
>  the sake of science, without gettig any compensation
>  for that.
>     Having written that, I decided to send the email
>  below again, in case it was forgotten; or maybe I am
>  missing something very basic?
>
>    I am using version 2.7.0, in windows XP.
>
>  Start of yesterday's email:
>
>     I am trying to optimize my script, because right
>  now it requires a lot of memory. The goal is to
>  generate four plots in one page. Every plot
>  corresponds to the means and sem's calculated for a
>  given variable at different days. In order to obtain
>  the means and sem's I apply the 'by' function. The way
>  I have done it so far is like this:
>
>  Read the data
>  Generate a summary of the mean and sem of a variable
>  at every Day.
>  Plot the mean and sem of that variable.
>
>  Repeat the same process for the other 3 variables.
>
>   I tried to optimize the code by using a for loop,
>  the code is below.
>
>
>
>  #Reading the data
>  dato<-read.csv('mydata.csv')
>  names(dato)<-c("id","day","tx","var1","var2","var3","var4")
>  dato<-dato[,1:7]
>
>  #Specify varible to be plotted
>  variable<-c('var1','var2','var3','var4')
>
>  #Define parameters of window where panel: margins,
>  number of plots in the panel
>  windows(height=9, width=9, rescale='fixed')
>  par(mfrow=c(2,2),xpd=T, bty='l',
>  omi=c(0.8,0.25,1.2,0.15), mai=c(1.1,0.8,0.3,0.3))
>
>
>  for (k in variable) {
>
>     dat<-dato[!is.na(k),]
>
>
>
>     summ<-by(dat,dat[,c("tx","day")], function(x) {
>         mn<-mean(x$k)
>         std<-sd(x$k)
>         n<-length(x$k)
>         se<-std/sqrt(n)
>         lowb<-mn-se
>         upb<-mn+se
>
>  data.frame(tx=x$tx[1],day=x$day[1],mn=mn,std=std,lowb=lowb,upb=upb,se=se)
>         })
>     summ<-do.call("rbind",summ)
>
>
>
>
>     #Definining x axis range
>     xmax<-unique(max(summ$day,na.rm=TRUE))
>     xmin<-unique(min(summ$day,na.rm=TRUE))
>
>     yaxmin<-unique(min(summ$lowb))
>     yaxmax<-unique(max(summ$upb))
>
>
>  plot(1,1,type='n',xlab='Day',xlim=c(xmin,xmax),ylim=c(yaxmin,yaxmax),
>  ylab=k,
>
>  las=1,cex.lab=1,xaxp=c(xmin,xmax,diff(range(c(xmin,xmax)))))
>         points(summ$day,summ$mn)
>
>  }
>
>
>
>
>     Where variable is a vector that specifies all the
>  variables I want to plot.
>
>  But I am getting the following error:
>
>  "Error in var(as.vector(x), na.rm = na.rm) : 'x' is
>  empty
>  In addition: Warning message:
>  In mean.default(x$k) : argument is not numeric or
>  logical: returning NA"
>
>    Could some one please show me how to structure my
>  code to achieve my final goal, which is to simplify
>  it?
>
>  I am attaching a csv file in case you want to run my
>  code.
>
>  Thank you very much in advance for your time and help,
>
>  Judith
>
>
>
>
>
>       ____________________________________________________________________________________
>  Be a better friend, newshound, and
>
> ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
http://had.co.nz/



More information about the R-help mailing list