[R] R tools for large files
David Khabie-Zeitoune
dave at evocapital.com
Tue Aug 26 11:03:04 CEST 2003
A starting point might be the string splitting function strsplit
For example,
> X = c("1,4,5" "1,2,5" "5,1,2")
> strsplit(X)
[[1]]
[1] "1" "4" "5"
[[2]]
[1] "1" "2" "5"
[[3]]
[1] "5" "1" "2"
This returns a list of the parsed vectors. Next you can do something
like:
> Z = data.frame(matrix(unlist(X), nrow = 3, byrow=T))
> Z
X1 X2 X3
1 1 4 5
2 1 2 5
3 5 1 2
-----Original Message-----
From: Ted.Harding at nessie.mcc.ac.uk [mailto:Ted.Harding at nessie.mcc.ac.uk]
Sent: 26 August 2003 09:00
To: R-help
Subject: Re: [R] R tools for large files
This has been an interesting thread! My first reaction to Murray's query
was to think "use standard Unix tools, especially awk", 'awk' being a
compact, fast, efficient program with great powers for processing lines
of data files (and in particular extracting, subsetting and transforming
database-like files e.g. CSV-type).
Of course, that became a sub-thread in its own right.
But -- and here I know I'm missing a trick which is why I'm responding
now so that someone who knows the trick can tell me -- while I normally
use 'awk' "externally" (i.e. I filter a data file through an 'awk'
program outside of R and then read the resulting file into R), I began
to think about doing it from within R.
Something on the lines of
X <- system("cat raw_data | awk '...' ", intern=TRUE)
would create an object X which is a character vector, each element of
which is one line from the output of the command "cat ...... ".
E.g. if "raw_data" starts out as
1,2,3,4,5
1,3,4,2,5
5,4,3,2,1
5,3,4,1,2
then
X<-system("cat raw_data.csv |
awk 'BEGIN{FS=\",\"}{if($3>$2){print $1 \",\" $4 \",\" $5}}'",
intern=TRUE)
gives
> X
[1] "1,4,5" "1,2,5" "5,1,2"
Now my Question:
How do I convert X into the dataframe I would have got if I had read
this output from a file instead of into the character vector X?
In other words, how to convert a vector of character strings, each of
which is in comma-separated format as above, into the rows of a
data-frame (or matrix, come to that)?
With thanks,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 26-Aug-03 Time: 08:59:48
------------------------------ XFMail ------------------------------
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
More information about the R-help
mailing list