[R] Tools for data preparation?
M.Mamin at intershop.de
Fri Nov 19 10:28:00 CET 2004
I had the same problem with log files containing many fields separated by the "|" character.
My task was to extract parts of some fields with regular expression and normalize the result to compact them (using R functions factor and table)
To reduce the data size, I first split the logfile into "subfiles" containing only one field from the original data.
So I could process one field after the other instead of loading the complete log file.
#index: list of fields to keep
system(paste('for n in ',index,'; \n',
'do sudo gzip -dc ',afile,' | cut -f$n -d"|" > ',tmpdir,'/',afile,'.$n \n',
=> files mylog,1, mylog.5, mylog.8
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of David Mitchell
Sent: Friday, November 19, 2004 4:54 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Tools for data preparation?
I'm regularly in the position where I have to do a lot of data
manipulation, in order to get the data I have into a format R is happy
with. This manipulation would generally be in one of two forms:
- getting data from e.g. text log files into a tabular format
- extracting sensible sample data from a very large data set (i.e. too
large for R to handle)
In general, I use Perl or Python to do the task; I'm curious as to
what others use when they hit the same problem.
R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help