[R] data input strategy - lots of csv files
Sean O'Riordain
sean.oriordain at gmail.com
Thu May 11 16:11:11 CEST 2006
Thank you folks - most helpful as always!
Now I have a bit of studying to do :-) I've never really understood
before how to use lapply (or anyother apply) so this gives me a real
problem relating to my own to work with!
Thanks again,
Sean
On 11/05/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Assuming:
>
> my.files <- c("file1.csv", "file2.csv", ..., "filen.csv")
>
> use read.zoo in the zoo package and merge.zoo (which
> can do a multiway merge):
>
> library(zoo)
> do.call("merge", lapply(my.files, read.zoo, ...any.other.read.zoo.args...))
>
> After loading zoo see:
> vignette("zoo")
> ?read.zoo
> ?merge.zoo
>
> On 5/11/06, Sean O'Riordain <sean.oriordain at gmail.com> wrote:
> > Good morning,
> > I have currently 63 .csv files most of which have lines which look like
> > 01/06/05,23445
> > Though some files have two numbers beside each date. There are
> > missing values, and currently the longest file has 318 rows.
> >
> > (merge() is losing the head and doing runaway memory allocation - but
> > thats another question - I'm still trying to pin that issue down and
> > make a small repeatable example)
> >
> > Currently I'm reading in these files with lines like
> > a1 <- read.csv("daft_file_name_1.csv",header=F)
> > ...
> > a63 <- read.csv("another_silly_filename_63.csv",header=F)
> >
> > and then i'm naming the columns in these like...
> > names(a1)[2] <- "silly column name"
> > ...
> > names(a63)[2] <- "daft column name"
> >
> > then trying to merge()...
> > atot <- merge(a1, a2, all=T)
> > and then using language manipulation to loop
> > atot <- merge(atot, a3, all=T)
> > ...
> > atot <- merge(atot, a63, all=T)
> > etc...
> >
> > followed by more language manipulation
> > for() {
> > rm(a1)
> > } etc...
> >
> > i.e.
> > for (i in 2:63) {
> > atot <- merge(atot, eval(parse(text=paste("a", i, sep=""))), all=T)
> > # eval(parse(text=paste("a",i,"[1] <- NULL",sep="")))
> >
> > cat("i is ", i, gc(), "\n")
> >
> > # now delete these 63 temporary objects...
> > # e.g. should look like rm(a33)
> > eval(parse(text=paste("rm(a",i,")", sep="")))
> > }
> >
> > eventually getting a dataframe with the first column being the date,
> > and the subsequent 63 columns being the data... with missing values
> > coded as NA...
> >
> > so my question is... is there a better strategy for reading in lots of
> > small files (only a few kbytes each) like that which are timeseries
> > with missing data... which doesn't go through the above awkwardness
> > (and language manipulation) but still ends up with a nice data.frame
> > with NA values correctly coded etc.
> >
> > Many thanks,
> > Sean O'Riordain
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>
More information about the R-help
mailing list