[R] Reading multiple files into R

Roger Bivand Roger.Bivand at nhh.no
Sun Oct 3 14:17:02 CEST 2004


On Sun, 3 Oct 2004, Vikas Rawal wrote:

> Thanks Kevin and Roger. This gave me the clue and was a great help.
> I have been trying it out. There is some problem in the code that still 
> needs to be figured out.
> 
> For the first 9 files, paste("wb-0", i, "vc.dbf", sep="") works fine.
> But as you rightly guessed, I have more files.
> So when I use paste("wb-", formatC(i, width=2, flag="0"), "vc.dbf", 
> sep=""), dbf.read does not work.
> formatC works find if I use it in cat(paste.................). It 
> displays the file names correctly.
> But when I use it in dbf.read, it gives the following error.
> 
> ***********************
> res[[i]] <- maptools:::dbf.read(paste("wb-", format(i, width=2,flag=0), 
> "vc.dbf", sep=""))
> Error in maptools:::dbf.read(paste("wb-", format(i, width=2,flag=0), 
> "vc.dbf", sep=""))
> Error in maptools:::dbf.read(paste("wb-", format(i, width = 2, flag = 
> 0),  :
>     unable to open DBF file
> ***********************
> 
> Of course, the data files are all right. I can read them individually.
> 
> What do you think could be the problem?

Look for the difference between:

> for(i in 1:12) cat(paste("wb-", format(i, width=2,flag=0),"vc.dbf", 
+ sep=""), "\n")

and 

> for(i in 1:12) cat(paste("wb-", formatC(i, width=2,flag=0),"vc.dbf", 
+ sep=""), "\n")

You need formatC(), not format(). sprintf() is:

> for(i in 1:12) cat(paste("wb-", sprintf(fmt="%0.2d", i),"vc.dbf", 
+ sep=""), "\n")

for the same as formatC().

On the rbind question:

> Now I have a vector of lists res. I would like to append all these 
> components into one single dataframe.
> I tried the following:

> rbind(for (i in 1:17) res[[i]]) -> distvc

> But this will not work. It works if I individually specify all the res 
> components.


this works:

> xx <- list(df1=data.frame(x=rnorm(10), y=rnorm(10), f=rep("A", 10)), 
+ df2=data.frame(x=rnorm(10), y=rnorm(10), f=rep("B", 10)))
> xxx <- NULL
> for(i in 1:length(xx)) xxx <- rbind(xxx, xx[[i]])

for() loops are not a bad thing if you are not repeating the operation 
(like reading in data) very frequently, and seem to me easier to debug 
than more sophisticated constructions. This for() loop will run slower as 
xxx grows, because it needs to re-allocate memory each time round. I would 
be tempted for many and large xx[[i]] to pre-allocate the combined data 
frame and just slot in the rows for each list component, if I knew that 
the numbers and classes og the columns were identical. But rbind() is 
cleaner, even though it will be slower - again, if you only need this a 
few times, the time hit is compensated for by simplicity.

> 
> Vikas
> 
> Kevin Bartz wrote:
> 
> > Roger Bivand wrote:
> >
> >> On Fri, 1 Oct 2004, Vikas Rawal wrote:
> >>
> >>
> >>> I want to read data from a number of files into R.
> >>> Reading individual files one by one requires writing enormous amount 
> >>> of code that will look something like the following.
> >>>
> >>> ****************
> >>> maptools:::dbf.read("wb-01vc.dbf")->dist1
> >>> maptools:::dbf.read("wb-02vc.dbf")->dist2
> >>> maptools:::dbf.read("wb-03vc.dbf")->dist3
> >>> maptools:::dbf.read("wb-04vc.dbf")->dist4
> >>> maptools:::dbf.read("wb-05vc.dbf")->dist5
> >>> maptools:::dbf.read("wb-06vc.dbf")->dist6
> >>> maptools:::dbf.read("wb-07vc.dbf")->dist7
> >>> maptools:::dbf.read("wb-08vc.dbf")->dist8
> >>> maptools:::dbf.read("wb-09vc.dbf")->dist9
> >>> *****************
> >>>
> >>
> >>
> >> In this case, you could pre-allocate a list and:
> >>
> >> res <- vector(mode="list", length=9)
> >> for (i in 1:length(res))     res[[i]] <- 
> >> maptools:::dbf.read(paste("wb-0", i, "vc.dbf", sep=""))
> >>
> >>
> >>> res <- vector(mode="list", length=9)
> >>> for (i in 1:length(res)) cat(paste("wb-0", i, "vc.dbf", sep=""), "\n")
> >>
> >>
> >> wb-01vc.dbf wb-02vc.dbf wb-03vc.dbf ...
> >>
> >> gives a check on what file names are being used.
> >>
> >> For 10 to 99 preserving the 01-09, use paste("wb-", formatC(i, 
> >> width=2, flag="0"), "vc.dbf", sep="").
> >>
> >> If the token is a character (string) that varies, you can roll out a 
> >> character vector of tokens first and step along it.
> >>
> >>
> >>> res <- vector(mode="list", length=length(LETTERS))
> >>> for (i in 1:length(res)) cat(paste("wb-", LETTERS[i], "vc.dbf", 
> >>> sep=""), 
> >>
> >>
> >> + "\n")
> >> wb-Avc.dbf wb-Bvc.dbf wb-Cvc.dbf ...
> >>
> >>
> >>
> >>> Is there a better way of doing this?
> >>>
> >>> Vikas
> >>>
> >>> ______________________________________________
> >>> R-help at stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide! 
> >>> http://www.R-project.org/posting-guide.html
> >>>
> >>
> >>
> >
> > Good call. Here's a somewhat more R-ified version:
> >
> > res <- lapply(paste("wb-", formatC(1:99, width=2, flag="0"), "vc.dbf",
> >                     sep=""), maptools:::dbf.read)
> >
> > Kevin
> >
> >
> >
> 
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no




More information about the R-help mailing list