[R] Reading and coalescing many datafiles.

Roger D. Peng rpeng at jhsph.edu
Thu Apr 14 18:20:21 CEST 2005


In my experience, using 'do.call("rbind", ...)' after storing all the 
data files in a list is much better than 'rbind'-ing on the fly.

-roger

asr at ufl.edu wrote:
> Greetings.
> 
> 
> I've got some analysis problems I'm trying to solve, the raw data for which
> are accumulated in a bunch of time-and-date-based files.
> 
> /some/path/2005-01-02-00-00-02
> 
> etc.
> 
> 
> The best 'read all these files' method I've seen in the r-help archives comes
> down to 
> 
> for (df in my_list_of_filenames )
>     {
>           dat <- rbind(dat,my_read_function(df))
>     } 
> 
> which, unpleasantly, is O(N^2) w.r.t. the number of files.
> 
> I'm fiddling with other idioms to accomplish the same goal.  Best I've come up
> with so far, after extensive reference to the mailing list archives, is
> 
> 
> my_read_function.many<-function(filenames)
>   {
>     filenames <- filenames[file.exists(filenames)];
>     rv <- do.call("rbind", lapply(filenames,my_read_function))
>     row.names(rv) = c(1:length(row.names(rv)))
>     rv
>   }
> 
> 
> I'd love to have some stupid omission pointed out.
> 
> 
> - Allen S. Rout
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/




More information about the R-help mailing list