[R] data input strategy - lots of csv files

Thu May 11 10:03:58 CEST 2006

Good morning,
I have currently 63 .csv files most of which have lines which look like
  01/06/05,23445
Though some files have two numbers beside each date.  There are
missing values, and currently the longest file has 318 rows.

(merge() is losing the head and doing runaway memory allocation - but
thats another question - I'm still trying to pin that issue down and
make a small repeatable example)

Currently I'm reading in these files with lines like
  a1 <- read.csv("daft_file_name_1.csv",header=F)
  ...
  a63 <- read.csv("another_silly_filename_63.csv",header=F)

and then i'm naming the columns in these like...
  names(a1)[2] <- "silly column name"
  ...
  names(a63)[2] <- "daft column name"

then trying to merge()...
  atot <- merge(a1, a2, all=T)
and then using language manipulation to loop
  atot <- merge(atot, a3, all=T)
  ...
  atot <- merge(atot, a63, all=T)
etc...

followed by more language manipulation
for() {
  rm(a1)
} etc...

i.e.
for (i in 2:63) {
    atot <- merge(atot, eval(parse(text=paste("a", i, sep=""))), all=T)
    #     eval(parse(text=paste("a",i,"[1] <- NULL",sep="")))

    cat("i is ", i, gc(), "\n")

    # now delete these 63 temporary objects...
    # e.g. should look like rm(a33)
    eval(parse(text=paste("rm(a",i,")", sep="")))
}

eventually getting a dataframe with the first column being the date,
and the subsequent 63 columns being the data... with missing values
coded as NA...

so my question is... is there a better strategy for reading in lots of
small files (only a few kbytes each) like that which are timeseries
with missing data... which doesn't go through the above awkwardness
(and language manipulation) but still ends up with a nice data.frame
with NA values correctly coded etc.

Many thanks,
Sean O'Riordain