[R] Newbie: Using R to analyse Apache logs

Raj Mathur raju at linux-delhi.org
Thu Jan 31 14:31:05 CET 2008

hits=-2.5 tests=BAYES_00,FORGED_RCVD_HELO
X-USF-Spam-Flag: NO


I have a requirement to scan Apache logs and discover ``exceptions''.  
Exceptions can be of two types:

1. A single IP generating a large amount of traffic within a given time frame 
(for definable values of ``large'' and ``time frame'').

2. A single IP hitting a wide set of URLs on the server (indicates a crawler), 
again for definable values of ``wide''.

I'm a complete newbie to R (and to statistics), so the questions are:

- Can R help me generate graphs which would help me identify these activities?

- Has someone already done something like this?  If so, where could I find it?

- If not, can someone help me with the stats (and R) part to help me achieve 
these objectives?  Any software that gets created as a result would be 
released under a FOSS license.

Data massaging, tuning, etc. are not an issue.  We'd be dealing with a few 
hundred thousand or a million records a day.


-- Raju
Raj Mathur                raju at kandalaya.org      http://kandalaya.org/
 Freedom in Technology & Software || February 2008 || http://freed.in/
       GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968 D0EF CC68 D17F
PsyTrance & Chill: http://schizoid.in/   ||   It is the mind that moves

More information about the R-help mailing list