[R] R usage for log analysis
Allen S. Rout
asr at ufl.edu
Mon Jun 12 06:44:51 CEST 2006
"Gabriel Diaz" <gabidiaz at gmail.com> writes:
> and what is the correct path to do it?
>
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?
I haven't stuck anything in a DB yet. I'm not sure how much of the DB
clue is used under the covers.
> pre-parse the log files to accomodate them to R?
Probably not; a little familiarity with the reading functions will
obviate most needs to pre-parse.
> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?
[...]
Don't expect a warm welcome. This community is like all open-source
communities, sharply focused on its' own concerns and expertise. And,
in an unusual experience for computer types, our core competencies
hold little or no sway here; they don't even give us much of a leg up.
Just wait 'till you want to do something nutso like produce a business
graphic. :)
I'm working on understanding enough of R packaging and documentation
to begin a 'task view' focused on systems administration, for humble
submission. That might end up being mostly "log analysis"; the term
can describe much of what we do, if it's stretched a bit. I'm hoping
the task view will attract the teeming masses of sysadmins trapped in
the mire of Gnuplot and friends.
For starters, become familliar with read.table(); with a few
variations it will take care of all the
while (<>) { @blah = split(/,/); etc. etc. etc. }
you've been accustomed to doing.
Name columns; this makes it easier to think about your data.
names(my_data)<-c("column","names","can","be","assigned","to")
Start thinking of your data in generic sets, as opposed to specific
rows. Situations which required iteration over specific rows in
PERL-land fall neatly to precise assignment in R. For example, if
you've got records with dates and times and you want to work with time
structures:
in PERL you'd
foreach (...)
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}
or some such. In R-land, the iteration is implicit. Here's a snippet
from something I'm using
a$pdate<-as.POSIXct(paste(format(a$dte,"%Y/%m/%d"),a$time))
You're really acting on logical columns all at once here. This is
fantastically more efficient in terms of your thought processes.
- Allen S. Rout
More information about the R-help
mailing list