[R] R tools for large files

Tue Aug 26 09:59:48 CEST 2003

This has been an interesting thread! My first reaction to Murray's
query was to think "use standard Unix tools, especially awk", 'awk'
being a compact, fast, efficient program with great powers for
processing lines of data files (and in particular extracting,
subsetting and transforming database-like files e.g. CSV-type).

Of course, that became a sub-thread in its own right.

But -- and here I know I'm missing a trick which is why I'm responding
now so that someone who knows the trick can tell me -- while I normally
use 'awk' "externally" (i.e. I filter a data file through an 'awk'
program outside of R and then read the resulting file into R), I began
to think about doing it from within R.

Something on the lines of

  X <- system("cat raw_data | awk '...' ", intern=TRUE)

would create an object X which is a character vector, each element of
which is one line from the output of the command "cat ...... ".

E.g. if "raw_data" starts out as

  1,2,3,4,5
  1,3,4,2,5
  5,4,3,2,1
  5,3,4,1,2

then

  X<-system("cat raw_data.csv |
  awk 'BEGIN{FS=\",\"}{if($3>$2){print $1 \",\" $4 \",\" $5}}'",
  intern=TRUE)

gives

  > X
  [1] "1,4,5" "1,2,5" "5,1,2"

Now my Question:
How do I convert X into the dataframe I would have got if I had read
this output from a file instead of into the character vector X?

In other words, how to convert a vector of character strings, each
of which is in comma-separated format as above, into the rows of
a data-frame (or matrix, come to that)?

With thanks,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 26-Aug-03                                       Time: 08:59:48
------------------------------ XFMail ------------------------------