[R] Index alternative to nasty FOR loop?

Dan Davison davison at stats.ox.ac.uk
Thu Aug 7 12:21:25 CEST 2008


On Wed, Aug 06, 2008 at 05:42:21PM +0000, zack holden wrote:
> 
> Dear R wizards,
>  
> I have a folder containing 1000 files. For each file, I need to extract the first row of each file, paste it to a new file, then write out that file. Then I need to repeat this operation for each additional row (row 2, then row 3, etc) for 23 rows in each file.
>  
> I can do this with a for loop (as below). 

Hi Zack,

There's a few problems with your sketched-out for loop (see below),
but if I've understood your problem, then here are a couple of
solutions that use for loops in the way you were intending. They both
take line i from file 1, line i from file 2, ..., and write them to a
file called lines_i, for i in 1:23. The first one is for the case when
you have tabular data, so it uses read.table, and write.table. You
might want to mess about with the arguments to read.table and
write.table, specifying whether you have a header, and whether you
want the row.names printed out, etc. The second one is similar but
just works line by line, regardless of what the line looks like
(i.e. doesn't assume you have tabular data in the files).

collate.lines.1 <- function(folder, nrows=23) {
    files <- list.files(folder, full.names=TRUE)
    for(file in files) {
        file.as.data.frame <- read.table(file)
        for(row in 1:nrows) {
            outfile <- paste("lines_", row, ".csv", sep="")
            write.table(file.as.data.frame[row,], file=outfile, append=TRUE, row.names=FALSE, col.names=FALSE, sep=",")
        }
    }
}

collate.lines.2 <- function(folder, nrows=23) {
    files <- list.files(folder, full.names=TRUE)
    for(file in files) {
        file.as.character.vector <- scan(file, what="", sep="\n")
        for(row in 1:nrows) {
            outfile <- paste("lines", row, sep="_")
            cat(file.as.character.vector[row], "\n", file=outfile, append=TRUE)
        }
    }
}

>  
> Is there a way to use some of the indexing power of R to get around this nasty loop?

If you really mean that you want a solution without explicit for loops
in R, then that is possible. But I would recommend that you stick to
a straightforward solution until you're completely comfortable with
programming in that style. It's conceivable that the no-for-loop
versions might be faster if you have lots of files / rows, but don't
worry aout speed until it's a problem. Here's my effort at doing it
without for loops; it's a bit of a stretch and wasn't as easy to write
down as the first two. I've probably missed a cleaner solution.

collate.lines.1.fancy <- function(folder, nrows=23) {
    outfiles <- paste("lines_", 1:nrows, ".csv", sep="")
    files <- list.files(folder, full.names=TRUE)
    files.as.data.frames <- lapply(files, read.table)
    x <- lapply(files.as.data.frames, function(df) split(df, f=factor(1:nrow(df)))) ## split all rows apart
    x <- do.call(mapply, c(x, list(FUN=function(...) rbind(...), SIMPLIFY=FALSE))) ## collate rows from different data frames
    write.function <- function(dataframe, outfile) write.table(dataframe, file=outfile, row.names=FALSE, col.names=FALSE, sep=",")
    invisible(mapply(write.function, x, outfiles))
}

>  
> Thank you in advance for any suggestions
>  
> ###################
> newoutfile <- data.frame()
> list <- list.files("c:/data") ## 'list' not such a good name as it's a built-in function
>  
> file = 1 ## you don't need this
> for(file in list) {
>    row <- file[1, ] ## that's not going to work; 'list' is a character vector, you haven't got the files as data.frames yet
>    newoutfile <- rbind(row, newoutfile)
>    file = file + 1
> write.csv(outfile, file = "output.csv")
> }
> ####################
>  
>  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list