[R] count how many row i have in a txt file in a directory

Sun Feb 26 16:55:20 CET 2012

On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote:
> Dear Researchers,
> 
> I have a large TXT (X,Y,MyValue) file in a directory and I wish to import
> row by row the txt in a loop to save only the data they are inside a buffer
> (using inside.owin of spatstat) and delete the rest. The first step before
> to create a loop row-by-row is to know how many rows there are in the txt
> file without load in R to save memory problem.
> 
> some people know the specific function?

If the number of rows are many that even only three variables per row
will cause memory problems, then looping the file row-by-row will take
a very long time.

I would - instead of looping row-by-row - split the text file into
chunks small enough for a chunk to be read into R, and operated on
within R, without memory problems.

I create a test file of 10.000.000 rows

my.words <- replicate(10000, paste(LETTERS[sample.int(28, 10)], sep = "", collapse = ""))
my.df <- data.frame(x=rnorm(10000000), y=rnorm(10000000), my.val=rep(my.words, 1000))
write.csv(my.df, file = "testmem.csv")

Split the file into smaller chunks, say 1.000.000 rows. I use the
split command in GNU coreutils,

$ split -l 1000000 testmem.csv

Loop through the cunks.

for(file.name in c("xaa", "xab" ...){
  chunk <- read.csv(file = file.name)
  [ match and add all the interesting rows to an object ]
}

Here's an example that for each chunk prints its third row.

for(file.name in c("xaa", "xab")){
  chunk <- read.csv(file = file.name)
  print(chunk[3,])
}

With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop.