[R] on ``unfolding'' a json into data frame columns
Daniel Bastos
dbastos at toledo.com
Tue Nov 29 15:06:17 CET 2016
Greetings!
In an SQL table, I have a column that contains a JSON. I'd like easy
access to all (in an ideal world) of these JSON fields. I started out
trying to get all fields from the JSON and so I wrote this function.
unfold.json <- function (df, column)
{
library(jsonlite)
ret <- data.frame()
for (i in 1:nrow(df)) {
js <- fromJSON(df[i, ][[column]])
ret <- rbind(ret, cbind(df[i, ], js))
}
ret
}
It takes a data frame and a column-string where the JSON is to be
found. It produces a new RET data frame with all the rows of DF but
with new columns --- extracted from every field in the JSON.
(The performance is horrible.)
fromJSON sometimes produces a list that sometimes contains a data frame.
As a result, I end up getting a RET data frame with duplicated rows.
Here's what happens.
> nrow(df)
[1] 1
> nrow(unfold.json(df, "response"))
[1] 3
Warning messages:
1: In data.frame(CreateUTC = "2016-11-29 02:00:43", Payload = list( :
row names were found from a short variable and have been discarded
2: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
>
I expected a data frame with 1 row. The reason 3 rows is produced is
because in the JSON there's an array with 3 rows.
> fromJSON(df$response)$RawPayload
[1] 200 1 128
I have also cases where fromJSON(df$response)$Payload$Fields is a data
frame containing various rows. So unfold.json produces a data frame
with these various rows.
So I gave up on this general approach.
(*) My humble approach
For the moment I'm not interested in RawPayload nor Payload$Fields, so I
nullified them in this new approach. To improve performance, I guessed
perhaps merge() would help and I think it did, but this was not at all a
decision thought out.
unfold.json.fast <- function (df, column)
{
library(jsonlite)
ret <- data.frame()
if (nrow(df) > 0) {
for (i in 1:nrow(df)) {
ls <- fromJSON(df[i, ][[column]])
ls$RawPayload <- NULL
ls$Payload$Fields <- NULL
js <- data.frame(ls)
ret <- rbind(ret, merge(df[i, ], js))
}
}
ret
}
I'm looking for advice. How would you approach this problem?
Thank you!
More information about the R-help
mailing list