[R] Reading Json data
Mark Sharp
msharp at txbiomed.org
Mon Aug 10 17:46:19 CEST 2015
Mayukh,
I apologize for taking so long to get back to your problem. I expect you may have found the solution. If so I would be interested. I have developed a hack to solve the problem, but I expect if someone knew how to handle JSON objects or even text parsing better they could develop a more elegant solution.
As I understand the problem, your text file has more than one JSON object in text form. There are three. The first two are very similar and the last is a trailer indication what was done, when it was done and the number of JSON objects sent. The problem is that fromJSON() only pulls off the first of the JSON objects.
I have defined three helper functions to separate the JSON objects, read them in, and store them in a list.
library(RJSONIO)
library(stringi, quietly = TRUE)
#library(jsonlite) # also works
#' Returns dataframe with ordered locations of the matching braces.
#'
#' There is almost certainly a better function to do this.
#' @param txt character vector of length one having 0 or more matching braces.
#' @import stringi
#' @examples
#' library(rmsutilityr)
#' match_braces("{123{456{78}9}10}")
#' @export
match_braces <- function(txt) {
txt <- txt[1] # just in the case of having more than one element
left <- stri_locate_all_regex(txt, "\\{")[[1]][ , 1]
right <- stri_locate_all_regex(txt, "\\}")[[1]][ , 2]
len <- length(left)
braces <- data.frame(left = rep(0, len), right = rep(0, len))
for (i in seq_along(right)) {
for (j in rev(seq_along(left))) {
if (left[j] < right[i] & left[j] != 0) {
braces$left[i] <- left[j]
braces$right[i] <- right[i]
left[j] <- 0
break
}
}
}
braces[order(braces$left), ]
}
#' Returns a list containing two objects in the text of a character vector
#' of length one: (1) object = the first json object found and (2) remainder =
#' the remaining text.
#'
#' Properly formed messages are assumed. Error checking is non-existent.
#' @param json_txt character vector of length one having one or more JSON
#' objects in character form.
#' @import stringi
#' @export
get_first_json_message <- function(json_txt) {
len <- stri_length(json_txt)
braces <- match_braces(json_txt)
if (braces$right[1] + 1 > len) {
remainder <- ""
} else {
remainder <- stri_trim_both(stri_sub(json_txt, braces$right[1] + 1))
}
list(object = stri_sub(json_txt, braces$left[1], to = braces$right[1]),
remainder = remainder)
}
#' Returns list of lists made by call to fromJSON()
#' @param json_txt character vector of length 1 having one or more
#' JSON objects in text form.
#' @import stringi
#' @export
get_json_list <- function (json_txt) {
t_json_txt <- json_txt
i <- 0
json_list <- list()
repeat{
i <- i + 1
message_remainder <- get_first_json_message(t_json_txt)
json_list[i] <- list(fromJSON(message_remainder$object))
if (message_remainder$remainder == "")
break
t_json_txt <- message_remainder$remainder
}
json_list
}
json_file <- "../data/json_file.txt"
json_txt <- stri_trim_both(stri_c(readLines(json_file), collapse = " "))
json_list <- get_json_list(json_txt)
length(json_list)
R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org
> On Jul 27, 2015, at 5:16 PM, Mark Sharp <msharp at TxBiomed.org> wrote:
>
> Mayukh,
>
> I think you are missing an argument to paste() and a right parenthesis character.
>
> Try
> json_data <- fromJSON(paste(readLines(json_file), collapse = " "))
>
> Mark
> R. Mark Sharp, Ph.D.
> msharp at TxBiomed.org
>
>
>
>
>
>> On Jul 27, 2015, at 3:41 PM, Mayukh Dass <mayukh.dass at gmail.com> wrote:
>>
>> Hello,
>>
>> I am trying to read a set of json files containing tweets using the
>> following code:
>>
>> json_data <- fromJSON(paste(readLines(json_file))
>>
>> Unfortunately, it only reads the first record on the file. For example, in
>> the file below, it only reads the first record starting with "id":"tag:
>> search.twitter.com,2005:3318539389". What is the best way to retrieve these
>> records? I have 20 such json files with varying number of tweets in it.
>> Thank you in advance.
>>
>> Best,
>> Mayukh
>>
>> {"id":"tag:search.twitter.com
>> ,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id:
>> twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I
>> say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens,
>> NE Pats, Fundraising, Dogs & History. Craft Beer & Running
>> Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern
>> Time (US &
>> Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":"
>> http://twitter.com
>> "},"provider":{"objectType":"service","displayName":"Twitter","link":"
>> http://www.twitter.com"},"link":"
>> http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about
>> the man who created the @Starbucks logo. Additional link at the bottom on
>> how it came to be: http://bit.ly/16bOJk
>> ","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool
>> story about the man who created the @Starbucks logo. Additional link at the
>> bottom on how it came to be: http://bit.ly/16bOJk","link":"
>> http://twitter.com/meetjenn/statuses/3318539389
>> ","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":"
>> http://bit.ly/16bOJk
>> "}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}}
>> {"id":"tag:search.twitter.com
>> ,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id:
>> twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood
>> Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A
>> little food for
>> thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific
>> Time (US &
>> Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating
>> some
>> thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":"
>> http://twitter.com
>> "},"provider":{"objectType":"service","displayName":"Twitter","link":"
>> http://www.twitter.com"},"link":"
>> http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz
>> Report » How Starbucks saved millions. Oregon closures ...
>> http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object:
>> search.twitter.com,2005:3318543260","summary":"Oregon Biz Report » How
>> Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj
>> ","link":"http://twitter.com/FastestFood/statuses/3318543260
>> ","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":"
>> http://u.mavrev.com/02bdj
>> "}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}}
>> {"info":{"message":"Replay Request
>> Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}}
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list