[R] rbind and data.frame

Göran Broström gb at stat.umu.se
Fri Dec 7 17:30:35 CET 2001


On Fri, 7 Dec 2001, Liaw, Andy wrote:

> Are you sure that the time difference is *only* in creating the data frame,
> rather than other computations in the loop?

Of course it depends on all the calculations. And that is a lot (of code).
Here it is. Suggestions of improvements are most welcome!

Göran
-----------------------------------------------------------------------
[...]
  ## We now have 'nn.out'. We next create an empty data frame 'dat.out':
  xx <- cbind(dat[1, , drop = FALSE], com.dat[1, , drop = FALSE])
  dat.out <- matrix(NA, ncol = ncol(xx), nrow = nn.out)
  dat.out <- data.frame(dat.out)
  names(dat.out) <- names(xx)
  dat.out <- rbind(xx, dat.out)[-1, ]

  ## And so we fill it!

  cur.row <- 0
  for (j in 1:nn){
    start.ind <- dat$bdate[j] + dat$enter[j]
    stopp.ind <- dat$bdate[j] + dat$exit[j]
    
    if ((start.ind < end.per) &&
        (stopp.ind > beg.per)){   ## We have a case!
      fixed.rec <- dat[j, , drop = FALSE]
      out.rec <- fixed.rec
      if (start.ind < beg.per){      ## start.ind < beg.per  (A)
        
        if (stopp.ind > end.per){    ## stopp.ind > end.per  (A1)
          ##nn.out <- nn.out + n.years
          out.rec$event <- 0
          for (iv in 1:n.years){
            cur.row <- cur.row + 1
            out.rec$enter <- cuts[iv] - fixed.rec$bdate
            out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[iv, , drop = FALSE])
          }
        }else{                       ## stopp.ind <= end.per (A2)
          last.iv <- 1
          while ((last.iv <= n.years) &&
                 (stopp.ind > cuts[last.iv + 1])){
            last.iv <- last.iv + 1
          }
          ##nn.out <- nn.out + last.iv
          if (last.iv == 1){
            cur.row <- cur.row + 1
            out.rec$enter <- beg.per - fixed.rec$bdate
            out.rec$exit <- fixed.rec$exit
            out.rec$event <- fixed.rec$event
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[1, , drop = FALSE])
          }else{
            out.rec$event <- 0
            for (iv in 1:(last.iv - 1)){
              cur.row <- cur.row + 1
              out.rec$enter <- cuts[iv] - fixed.rec$bdate
              out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
              dat.out[cur.row, ] <-
                cbind(out.rec, com.dat[iv, , drop = FALSE])
            }
            cur.row <- cur.row + 1            
            out.rec$event <- fixed.rec$event
            out.rec$enter <- cuts[last.iv] - fixed.rec$bdate
            out.rec$exit <- fixed.rec$exit
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[last.iv, , drop = FALSE])
          }
        }
      }else{                       ## start.ind >= beg.per  (B)
        first.iv <- 1
        while ((first.iv <= n.years) &&
               (start.ind >= cuts[first.iv + 1])){
          first.iv <- first.iv + 1
        }
        if (stopp.ind > end.per){  ## stopp.ind > end.per   (B1)
          ##nn.out <- nn.out + n.years - first.iv + 1
          cur.row <- cur.row + 1            
          out.rec$event <- 0
          out.rec$enter <- fixed.rec$enter
          out.rec$exit <- cuts[first.iv + 1] - fixed.rec$bdate
          dat.out[cur.row, ] <-
            cbind(out.rec, com.dat[first.iv, , drop = FALSE])
          if (first.iv < n.years){
            for (iv in (first.iv + 1):n.years){
              cur.row <- cur.row + 1
              out.rec$enter <- cuts[iv] - fixed.rec$bdate
              out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
              dat.out[cur.row, ] <-
                cbind(out.rec, com.dat[iv, , drop = FALSE])
            }
          }
        }else{                     ## stopp.ind <= end.per  (B2)
          last.iv <- first.iv
          while ((last.iv <= n.years) &&
                 (stopp.ind > cuts[last.iv + 1])){
            last.iv <- last.iv + 1
          }
          ##nn.out <- nn.out + last.iv - first.iv + 1
          if (last.iv == first.iv){
            cur.row <- cur.row + 1
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[first.iv, , drop = FALSE])
          }else{
            cur.row <- cur.row + 1
            out.rec$event <- 0
            out.rec$exit <- cuts[first.iv + 1] - fixed.rec$bdate
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[first.iv, , drop = FALSE])
            if (last.iv > (first.iv + 1)){
              for (iv in (first.iv + 1):(last.iv - 1)){
                cur.row <- cur.row + 1
                out.rec$enter <- cuts[iv] - fixed.rec$bdate
                out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
                dat.out[cur.row, ] <-
                  cbind(out.rec, com.dat[iv, , drop = FALSE])
              }
            }
            cur.row <- cur.row + 1
            out.rec$event <- fixed.rec$event
            out.rec$enter <- cuts[last.iv] - fixed.rec$bdate
            out.rec$exit <- fixed.rec$exit
            dat.out[cur.row, ] <-
              cbind(out.rec, com.dat[last.iv, , drop = FALSE])
          }
        }
      }
    }
    cat("j = ", j, "cur.row = ", cur.row, "\n")
  }
  
  dat.out
}
-------------------------------------------------------------------------

> 
> Andy
> 
> > -----Original Message-----
> > From: Göran Broström [mailto:gb at stat.umu.se]
> > Sent: Friday, December 07, 2001 7:25 AM
> > To: Prof Brian Ripley
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] rbind and data.frame
> > 
> > 
> > On Fri, 7 Dec 2001, Prof Brian Ripley wrote:
> > 
> > > On Fri, 7 Dec 2001, [iso-8859-1] Göran Broström wrote:
> > > 
> > > > On Wed, 5 Dec 2001, Göran Broström wrote:
> > > >
> > > > [...]
> > > >
> > > > > My real problem is how to create a data frame in a 
> > sequentially growing
> > > > > manner, when I know the final size (no of cases). I 
> > want to avoid to
> > > > > call 'rbind' many times, and instead create an 'empty' 
> > data frame in
> > > > > one call, and then fill it. Are there better ways of doing this?
> > > >
> > > > Got no answer to this one, so I provide one myself:
> > > 
> > > The usual answer is to create a data frame of the desired size and
> > > populate it via indexing.  That's in some books I know!
> > 
> > I know that book too (thanks!). I did what you suggest, and 
> > that took 7 
> > hours to run. Definitely.
> > 
> > Göran
> > 
> > > >
> > > > The answer is: Yes, definitely. I did this, with pure  R  
> > code, and
> > > > created a new data frame with around 58000 records. It 
> > took 7 hours to
> > > > run. I then did it with compiled code (Fortran), and that 
> > made a slight
> > > > difference:  It took 4.8 seconds(!).
> > > >
> > > > Göran
> > > >
> > > > 
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > > > r-help mailing list -- Read 
> > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > Send "info", "help", or "[un]subscribe"
> > > > (in the "body", not the subject !)  To: 
> > r-help-request at stat.math.ethz.ch
> > > > 
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > _._._._._._._._._
> > > >
> > > 
> > > 
> > 
> > -- 
> >  Göran Broström                      tel: +46 90 786 5223
> >  professor                           fax: +46 90 786 6614
> >  Department of Statistics            http://www.stat.umu.se/egna/gb/
> >  Umeå University
> >  SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se
> > 
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read 
> > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: 
> > r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > _._._._._._._._._
> > 
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

-- 
 Göran Broström                      tel: +46 90 786 5223
 professor                           fax: +46 90 786 6614
 Department of Statistics            http://www.stat.umu.se/egna/gb/
 Umeå University
 SE-90187 Umeå, Sweden             e-mail: gb at stat.umu.se

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list