[R] R tools for large files
(Ted Harding)
Ted.Harding at nessie.mcc.ac.uk
Tue Aug 26 09:59:48 CEST 2003
This has been an interesting thread! My first reaction to Murray's
query was to think "use standard Unix tools, especially awk", 'awk'
being a compact, fast, efficient program with great powers for
processing lines of data files (and in particular extracting,
subsetting and transforming database-like files e.g. CSV-type).
Of course, that became a sub-thread in its own right.
But -- and here I know I'm missing a trick which is why I'm responding
now so that someone who knows the trick can tell me -- while I normally
use 'awk' "externally" (i.e. I filter a data file through an 'awk'
program outside of R and then read the resulting file into R), I began
to think about doing it from within R.
Something on the lines of
X <- system("cat raw_data | awk '...' ", intern=TRUE)
would create an object X which is a character vector, each element of
which is one line from the output of the command "cat ...... ".
E.g. if "raw_data" starts out as
1,2,3,4,5
1,3,4,2,5
5,4,3,2,1
5,3,4,1,2
then
X<-system("cat raw_data.csv |
awk 'BEGIN{FS=\",\"}{if($3>$2){print $1 \",\" $4 \",\" $5}}'",
intern=TRUE)
gives
> X
[1] "1,4,5" "1,2,5" "5,1,2"
Now my Question:
How do I convert X into the dataframe I would have got if I had read
this output from a file instead of into the character vector X?
In other words, how to convert a vector of character strings, each
of which is in comma-separated format as above, into the rows of
a data-frame (or matrix, come to that)?
With thanks,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 26-Aug-03 Time: 08:59:48
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list