[Rd] loading multiple CSV files into a single data frame

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 3 20:54:58 CEST 2012


On Thu, May 3, 2012 at 2:07 PM, victor jimenez <betabandido at gmail.com> wrote:
> Sometimes I have hundreds of CSV files scattered in a directory tree,
> resulting from experiments' executions. For instance, giving an example
> from my field, I may want to collect the performance of a processor for
> several design parameters such as "cache size" (possible values: 2, 4, 8
> and 16) and "cache associativity" (possible values: direct-mapped, 4-way,
> fully-associative). The results of all these experiments will be stored in
> a directory tree like:
>
> results
>  |-- direct-mapped
>  |       |-- 2 -- data.csv
>  |       |-- 4 -- data.csv
>  |       |-- 8 -- data.csv
>  |       |-- 16 -- data.csv
>  |-- 4-way
>  |       |-- 2 -- data.csv
>  |       |-- 4 -- data.csv
> ...
>  |-- fully-associative
>  |       |-- 2 -- data.csv
>  |       |-- 4 -- data.csv
> ...
>
> I am developing a package that would allow me to gather all those CSV into
> a single data frame. Currently, I just need to execute the following
> statement:
>
> dframe <- gather("results/@ASSOC@/@SIZE@/data.csv")
>
> and this command returns a data frame containing the columns ASSOC, SIZE
> and all the remaining columns inside the CSV files (in my case the
> processor performance), effectively loading all the CSV files into a single
> data frame. So, I would get something like:
>
> ASSOC,          SIZE, PERF
> direct-mapped,       2,     1.4
> direct-mapped,       4,     1.6
> direct-mapped,       8,     1.7
> direct-mapped,     16,     1.7
> 4-way,                   2,     1.4
> 4-way,                   4,     1.5
> ...
>
> I would like to ask whether there is any similar functionality already
> implemented in R. If so, there is no need to reinvent the wheel :)
> If it is not implemented and the R community believes that this feature
> would be useful, I would be glad to contribute my code.
>

If your csv files all have the same columns and represent time series
then read.zoo in the zoo package can read multiple csv files in at
once using a single read.zoo command producing a single zoo object.

library(zoo)
?read.zoo
vignette("zoo-read")

Also see the other zoo vignettes and help files.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-devel mailing list