[R] how to get how many lines there are in a file.

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Dec 7 03:00:55 CET 2004


Hu Chen asked
	> If I wanna get the total number of lines in a big file without reading
	> the file's content into R as matrix or data frame, any methods or
	> functions?
	
_Something_ must read it, but it doesn't have to be R.
On a UNIX system, you can simply do

    number.of.lines <- as.numeric(system(paste("wc -l <", file.name), TRUE))

Suppopse file.name is "massive.csv".
Then paste("wc -l <", file.name) is "wc -l < massive.csv", which is a
UNIX command to write the number of lines in massive.csv to stdout,
and system(cmd, TRUE) executes the UNIX command and returns everything
it writes to stdout as an R character vector, one element per line of
output.  In this case, there's one line of output, so one element.
Don't forget the TRUE; without it the command's standard output is not
captured, just displayed.
Finally, as.numeric turns that string into a number.

For example, on my machine,
    > as.numeric(system("wc -l <$HOME/.cshrc", TRUE))
    [1] 32

This will work in MacOS X, and you can get 'wc' for Windows, so it can be
made to work there too.

If the file is large, this is likely to be a lot faster than reading it in R.  

But the obvious question is "what happens next"?  If you want to decide
whether the amount of data is too big, then
    - false positives:  data files may contain comments, which will be
      counted by wc but don't affect the amount of memory you need
    - false negatives:  the amount of memory you need depends on the
      number (and type) of columns as well as the number of lines,
      just counting the lines may leave you thinking there is room when
      there isn't.




More information about the R-help mailing list