[R] reading web log file into R

jim holtman jholtman at gmail.com
Wed Sep 23 14:22:22 CEST 2009


Here is a way to do it.  I assume that you data has each record on a
line; it came through the email as multiple lines.


> x <- readLines("/tempxx.txt")
> # remove '#Fields:" so it can be used as a header
> x <- sub("^#Fields: ", "", x)
> # remove comment lines
> x <- x[-grep("^#", x)]
> # remove quotes
> x <- gsub('"', '', x)
> # now read in the data
> input <- read.table(textConnection(x), header=TRUE)
>
> str(input)
'data.frame':   2 obs. of  16 variables:
 $ date          : Factor w/ 1 level "2007-12-03": 1 1
 $ time          : Factor w/ 1 level "13:50:17": 1 1
 $ c.ip          : Factor w/ 1 level "200.40.203.197": 1 1
 $ cs.username   : Factor w/ 1 level "-": 1 1
 $ s.ip          : Factor w/ 1 level "200.40.51.20": 1 1
 $ s.port        : int  80 80
 $ cs.method     : Factor w/ 1 level "GET": 1 1
 $ cs.uri.stem   : Factor w/ 2 levels "/localidades/img/cargando.gif",..: 2 1
 $ cs.uri.query  : Factor w/ 1 level "-": 1 1
 $ sc.status     : int  200 200
 $ sc.bytes      : int  328 1150
 $ cs.bytes      : int  447 451
 $ time.taken    : int  0 0
 $ cs.User.Agent.: Factor w/ 1 level
"Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)":
1 1
 $ cs.Cookie.    : Factor w/ 1 level
"ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA": 1 1
 $ cs.Referer.   : Factor w/ 1 level
"http://www.teatro.com/localidades/localidades.asp": 1 1
>


On Tue, Sep 22, 2009 at 9:51 PM, Sebastian Kruk <residuo.solow at gmail.com> wrote:
> If I have a web log file as follows:
>
> #Software: Microsoft Internet Information Services 5.0
> #Version: 1.0
> #Date: 2007-12-03 13:50:17
> #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem
> cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User-Agent)
> cs(Cookie) cs(Referer)
> "2007-12-03 13:50:17 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/nada.gif - 200 328 447 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
> ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA
> http://www.teatro.com/localidades/localidades.asp"
> "2007-12-03 13:50:17 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/cargando.gif - 200 1150 451 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
> ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA
> http://www.teatro.com/localidades/localidades.asp"
> "2007-12-03 13:50:18 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/cerrar.png - 200 450 449 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
>
> how can I turn it into a dataframe with 3 rows, and 16 columns named
> date time c-ip cs-username s-ip s-port cs-method cs-uri-stem
> cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User-Agent)
> cs(Cookie) cs(Referer) skiping lines begining with #?
>
> Thanks,
>
> Sebastián.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list