[R] Newbie: Using R to analyse Apache logs
Raj Mathur
raju at linux-delhi.org
Thu Jan 31 14:31:05 CET 2008
hits=-2.5 tests=BAYES_00,FORGED_RCVD_HELO
X-USF-Spam-Flag: NO
Hi,
I have a requirement to scan Apache logs and discover ``exceptions''.
Exceptions can be of two types:
1. A single IP generating a large amount of traffic within a given time frame
(for definable values of ``large'' and ``time frame'').
2. A single IP hitting a wide set of URLs on the server (indicates a crawler),
again for definable values of ``wide''.
I'm a complete newbie to R (and to statistics), so the questions are:
- Can R help me generate graphs which would help me identify these activities?
- Has someone already done something like this? If so, where could I find it?
- If not, can someone help me with the stats (and R) part to help me achieve
these objectives? Any software that gets created as a result would be
released under a FOSS license.
Data massaging, tuning, etc. are not an issue. We'd be dealing with a few
hundred thousand or a million records a day.
Regards,
-- Raju
--
Raj Mathur raju at kandalaya.org http://kandalaya.org/
Freedom in Technology & Software || February 2008 || http://freed.in/
GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EF CC68 D17F
PsyTrance & Chill: http://schizoid.in/ || It is the mind that moves
More information about the R-help
mailing list