[R] How to process a log file in R ?

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Jun 23 08:35:53 CEST 2014


Please always "reply-all" so the list stays in the conversation. With the 
details presented below, this question is a perfect example of why you 
should keep the list in the conversation because I don't know how to solve 
your problem.

Also please read and follow the Posting Guide, which warns you not to post 
in HTML since what you see is not necessarily what we see in that format.

I do know that it is quite common for structured input routines that 
process data such as XML and JSON to look for a single "object" and 
convert it. Normally this is an "outermost layer" of structure that 
encompasses the entire contents of the file.  In your case, the log file 
is an indefinite stream of objects. From what I can see this is not a 
standard use case for most JSON (or XML) libraries.

One solution discussed on StackOverflow [1] is to encode each object 
without internal newlines, and separate the objects by newlines.  This 
would require that you have control over the software that was generating 
this stream of data.

[1] 
http://stackoverflow.com/questions/9829811/how-can-i-parse-the-first-json-object-on-a-stream-in-js

On Mon, 23 Jun 2014, praveen pal wrote:

> hi jeff sorry for my late response
> 
> actually i am trying to parse this log file which i attached in my previous
> mail.
> 
>  this log contains activity of students like
>     how many times they login,
>     which page they visit,
>     which and how much video they watched,
>     which quiz they attempted,
>     did they give correct or wrong answer.
> and similar other data and clicks.
> 
>  i want to extract these data of students from log file  and want to store
> it in a table  (table is not necessary if work is done by R only ).
>  then i want to perform some statistic on the data extracted from log file.
> 
> I am using jsonlite  to parse it and below is code which i used to parse it
> 
> library(jsonlite)
> tracking_json  <-
> fromJSON("/home/praveen/praveen/Cleaning_Data/tracking.log")
> names(tracking_json)
> tracking_json$username
> 
> this code prints only the first data and ignore everything else.
> 
> then i tried
> 
> path <- "/home/praveen/praveen/Cleaning_Data/tracking.log"
> c <- file(path,"r")
> l <- readLines(c,-1L)
> json  <- lapply(X=l,fromJSON)
> json[[1]]
> 
> it parse and store the data from log file to array
>  i am able to print data with the  json[[1]]
> but i don't know how to collect whole data related to one factor only
> for example if i want to collect all the related to single student .
> 
> if there is any other way to do this or you have a document or link to parse
> a log file.
> 
> 
> and then i want to perform some statistic
> i am new to R may be there is some mistake in my explanation.
> 
> 
> On Fri, Jun 20, 2014 at 7:23 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
>       You have not said what your problem is, and you have not shown
>       us the code you have tried. How are we supposed to know what you
>       want?
> ---------------------------------------------------------------------------
>
>       Jeff Newmiller                        The     .....       .....
>        Go Live...
>       DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>        Live Go...
>                                             Live:   OO#.. Dead: OO#..
>        Playing
>       Research Engineer (Solar/Batteries            O.O#.       #.O#.
>        with
>       /Software/Embedded Controllers)               .OO#.       .OO#.
>        rocks...1k
> ---------------------------------------------------------------------------
>
>       Sent from my phone. Please excuse my brevity.
>
>       On June 20, 2014 12:26:03 AM PDT, praveen pal
>       <pal.praveen11 at gmail.com> wrote:
>       >Hello
>       >
>       >   i am new to R and need help,
>       >   I want to parse a log file using R named tracking.log file.
>       > i try to read it using "jsonlite" package but not able to get
>       data  in
>       >proper format.
>       >
>       >
>       >   i entered a sample of file below.
>       >   i also attached a file tracking.log file with this mail
>       >
>       >
>       > {
>       >    "username": "lavita",
>       >    "host": "10.105.22.32",
>       >    "event_source": "server",
>       >    "event_type":
> >"/courses/IITB/CS101/2014_T1/xblock/i4x:;_;_IITB;_CS101;_video;_d333fa637a
>       074b41996dc2fd5e675818/handler/xmodule_handler/save_user_state",
>       >    "context": {
>       >        "course_id": "IITB/CS101/2014_T1",
>       >        "course_user_tags": {},
>       >        "user_id": 42,
>       >        "org_id": "IITB"
>       >    },
>       >    "time": "2014-06-20T05:49:10.468638+00:00",
>       >    "ip": "127.0.0.1",
>       >    "event": "{\"POST\": {\"saved_video_position\":
>       [\"00:02:10\"]},
>       >\"GET\": {}}",
>       >"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:18.0)
>       Gecko/20100101
>       >Firefox/18.0",
>       >    "page": null
>       >}
>       >
>       >{
>       >    "username": "raeha",
>       >    "host": "10.105.22.32",
>       >    "event_source": "server",
>       >    "event_type": "problem_check",
>       >    "context": {
>       >        "course_id": "IITB/CS101/2014_T1",
>       >        "course_user_tags": {},
>       >        "user_id": 40,
>       >        "org_id": "IITB",
>       >        "module": {
>       >            "display_name": ""
>       >        }
>       >    },
>       >    "time": "2014-06-20T06:43:52.716455+00:00",
>       >    "ip": "127.0.0.1",
>       >    "event": {
>       >        "submission": {
>       >      
>       "i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
>       >                "input_type": "choicegroup",
>       >                "question": "",
>       >                "response_type": "multiplechoiceresponse",
>       >                "answer": "MenuInflater.inflate()",
>       >                "variant": "",
>       >                "correct": true
>       >            }
>       >        },
>       >        "success": "correct",
>       >        "grade": 1,
>       >        "correct_map": {
>       >      
>       "i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
>       >                "hint": "",
>       >                "hintmode": null,
>       >                "correctness": "correct",
>       >                "npoints": null,
>       >                "msg": "",
>       >                "queuestate": null
>       >            }
>       >        },
>       >        "state": {
>       >            "student_answers": {},
>       >            "seed": 1,
>       >            "done": null,
>       >            "correct_map": {},
>       >            "input_state": {
>       >
>       >"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1":
>       {}
>       >            }
>       >        },
>       >        "answers": {
>       >        
>       "i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1":
>       >"choice_0"
>       >        },
>       >        "attempts": 1,
>       >        "max_grade": 1,
>       >        "problem_id":
>       >"i4x://IITB/CS101/problem/33e4aac93dc84f368c93b1d08fa984fc"
>       >    },
>       >"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0)
>       Gecko/20100101
>       >Firefox/29.0",
>       >    "page": "x_module"
>       >}
>       >
>       >
>       >{
>       >    "username": "tushars",
>       >    "host": "localhost",
>       >    "event_source": "server",
>       >    "event_type":
> >"/courses/IITB/CS101/2014_T1/instructor_dashboard/api/list_instructor_task
>       s",
>       >    "context": {
>       >        "course_id": "IITB/CS101/2014_T1",
>       >        "course_user_tags": {},
>       >        "user_id": 6,
>       >        "org_id": "IITB"
>       >    },
>       >    "time": "2014-06-20T05:49:26.780244+00:00",
>       >    "ip": "127.0.0.1",
>       >    "event": "{\"POST\": {}, \"GET\": {}}",
>       >"agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:29.0)
>       Gecko/20100101
>       >Firefox/29.0",
>       >    "page": null
>       >}
> 
> 
> 
> 
> --
> Thank you
> 
> Praveen Pal
> 
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------


More information about the R-help mailing list