[R] reading web log file into R
jim holtman
jholtman at gmail.com
Wed Sep 23 14:22:22 CEST 2009
Here is a way to do it. I assume that you data has each record on a
line; it came through the email as multiple lines.
> x <- readLines("/tempxx.txt")
> # remove '#Fields:" so it can be used as a header
> x <- sub("^#Fields: ", "", x)
> # remove comment lines
> x <- x[-grep("^#", x)]
> # remove quotes
> x <- gsub('"', '', x)
> # now read in the data
> input <- read.table(textConnection(x), header=TRUE)
>
> str(input)
'data.frame': 2 obs. of 16 variables:
$ date : Factor w/ 1 level "2007-12-03": 1 1
$ time : Factor w/ 1 level "13:50:17": 1 1
$ c.ip : Factor w/ 1 level "200.40.203.197": 1 1
$ cs.username : Factor w/ 1 level "-": 1 1
$ s.ip : Factor w/ 1 level "200.40.51.20": 1 1
$ s.port : int 80 80
$ cs.method : Factor w/ 1 level "GET": 1 1
$ cs.uri.stem : Factor w/ 2 levels "/localidades/img/cargando.gif",..: 2 1
$ cs.uri.query : Factor w/ 1 level "-": 1 1
$ sc.status : int 200 200
$ sc.bytes : int 328 1150
$ cs.bytes : int 447 451
$ time.taken : int 0 0
$ cs.User.Agent.: Factor w/ 1 level
"Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)":
1 1
$ cs.Cookie. : Factor w/ 1 level
"ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA": 1 1
$ cs.Referer. : Factor w/ 1 level
"http://www.teatro.com/localidades/localidades.asp": 1 1
>
On Tue, Sep 22, 2009 at 9:51 PM, Sebastian Kruk <residuo.solow at gmail.com> wrote:
> If I have a web log file as follows:
>
> #Software: Microsoft Internet Information Services 5.0
> #Version: 1.0
> #Date: 2007-12-03 13:50:17
> #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem
> cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User-Agent)
> cs(Cookie) cs(Referer)
> "2007-12-03 13:50:17 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/nada.gif - 200 328 447 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
> ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA
> http://www.teatro.com/localidades/localidades.asp"
> "2007-12-03 13:50:17 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/cargando.gif - 200 1150 451 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
> ASPSESSIONIDSQCBSQAB=JOLECDCCBFCKPOFLGDLHMENA
> http://www.teatro.com/localidades/localidades.asp"
> "2007-12-03 13:50:18 200.40.203.197 - 200.40.51.20 80 GET
> /localidades/img/cerrar.png - 200 450 449 0
> Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
>
> how can I turn it into a dataframe with 3 rows, and 16 columns named
> date time c-ip cs-username s-ip s-port cs-method cs-uri-stem
> cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User-Agent)
> cs(Cookie) cs(Referer) skiping lines begining with #?
>
> Thanks,
>
> Sebastián.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list