[R] count how many row i have in a txt file in a directory
Hans Ekbrand
hans at sociologi.cjb.net
Sun Feb 26 16:55:20 CET 2012
On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote:
> Dear Researchers,
>
> I have a large TXT (X,Y,MyValue) file in a directory and I wish to import
> row by row the txt in a loop to save only the data they are inside a buffer
> (using inside.owin of spatstat) and delete the rest. The first step before
> to create a loop row-by-row is to know how many rows there are in the txt
> file without load in R to save memory problem.
>
> some people know the specific function?
If the number of rows are many that even only three variables per row
will cause memory problems, then looping the file row-by-row will take
a very long time.
I would - instead of looping row-by-row - split the text file into
chunks small enough for a chunk to be read into R, and operated on
within R, without memory problems.
I create a test file of 10.000.000 rows
my.words <- replicate(10000, paste(LETTERS[sample.int(28, 10)], sep = "", collapse = ""))
my.df <- data.frame(x=rnorm(10000000), y=rnorm(10000000), my.val=rep(my.words, 1000))
write.csv(my.df, file = "testmem.csv")
Split the file into smaller chunks, say 1.000.000 rows. I use the
split command in GNU coreutils,
$ split -l 1000000 testmem.csv
Loop through the cunks.
for(file.name in c("xaa", "xab" ...){
chunk <- read.csv(file = file.name)
[ match and add all the interesting rows to an object ]
}
Here's an example that for each chunk prints its third row.
for(file.name in c("xaa", "xab")){
chunk <- read.csv(file = file.name)
print(chunk[3,])
}
With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop.
More information about the R-help
mailing list