[R] R usage for log analysis

Allen S. Rout asr at ufl.edu
Mon Jun 12 06:44:51 CEST 2006


"Gabriel Diaz" <gabidiaz at gmail.com> writes:

> and what is the correct path to do it?
> 
> I mean, put logs files in a mysql or somehting like that, and then
> make R use that data, using the data from the files directly?

I haven't stuck anything in a DB yet.  I'm not sure how much of the DB
clue is used under the covers. 

> pre-parse the log files to accomodate them to R?
 
Probably not; a little familiarity with the reading functions will
obviate most needs to pre-parse.


> I need faqs, manuals, books, whatever to learn about this, can anyone
> give some advice?

[...]


Don't expect a warm welcome.  This community is like all open-source
communities, sharply focused on its' own concerns and expertise.  And,
in an unusual experience for computer types, our core competencies
hold little or no sway here; they don't even give us much of a leg up.
Just wait 'till you want to do something nutso like produce a business
graphic. :)

I'm working on understanding enough of R packaging and documentation
to begin a 'task view' focused on systems administration, for humble
submission. That might end up being mostly "log analysis"; the term
can describe much of what we do, if it's stretched a bit.  I'm hoping
the task view will attract the teeming masses of sysadmins trapped in
the mire of Gnuplot and friends.


For starters, become familliar with read.table(); with a few
variations it will take care of all the 

while (<>) { @blah = split(/,/); etc. etc. etc. } 

you've been accustomed to doing.  

Name columns;  this makes it easier to think about your data.  

names(my_data)<-c("column","names","can","be","assigned","to")

Start thinking of your data in generic sets, as opposed to specific
rows.  Situations which required iteration over specific rows in
PERL-land fall neatly to precise assignment in R.  For example, if
you've got records with dates and times and you want to work with time
structures:

in PERL you'd 

foreach (...) 
{$foo->{pdate} = parsedate($foo->{date}." ".$foo->{time})}

or some such.  In R-land, the iteration is implicit.  Here's a snippet
from something I'm using 

a$pdate<-as.POSIXct(paste(format(a$dte,"%Y/%m/%d"),a$time)) 

You're really acting on logical columns all at once here.  This is
fantastically more efficient in terms of your thought processes.  



- Allen S. Rout



More information about the R-help mailing list