[R] Restructuring Star Wars data from rwars package
Ulrik Stervbo
ulrik.stervbo at gmail.com
Fri Aug 4 09:07:28 CEST 2017
Hi Matt,
the usual way would be to use do.call():
.lst <- list(x = list(a = 1, b = 2), y = list(a = 5, b = 8))
do.call(rbind, lapply(.lst, data.frame, stringsAsFactors = FALSE))
however, your list has vectors of unequal lengths making the above fail.
You somehow need to get everything to have the same length, The dplyr data
set has nested columns, but I believe a more transparent way is simply to
concatenate the elements of each vector longer than 1.
library("rwars")
library("tidyverse")
people <- get_all_people(parse_result = T)
people <- get_all_people(getElement(people, "next"), parse_result = T)
list_to_df_collapse <- function(.list){
.list %>%
lapply(paste, collapse = "|") %>%
bind_rows()
}
people$results %>%
lapply(list_to_df_collapse) %>%
bind_rows()
This does not re-create the dplyr data set though. To do this you need to
nest the longer than 1 variables. It turns out that some variables are not
found in all members of result, and some variables might have the length of
1 in one case but more than one in another. This means we are probably
better of knowing which columns must be nested.
# Find the variables that must be nested
vars_to_nest <- people$results %>%
# Get the length of each variable at each entry
map_df(function(.list){
.names <- names(.list)
.lengths <- sapply(.list, length)
data.frame(col = .names, len = .lengths, stringsAsFactors = FALSE)
}) %>%
# Get those that has a length of 2 or more in any entry
filter(len > 1) %>%
distinct(col) %>% flatten_chr()
list_to_df_nest <- function(.list, .vars_to_nest){
# Create a list of data.frames
tmp_lst <- .list %>%
map2(names(.), function(.value, .id){
data_frame(.value) %>%
set_names(.id)})
# Nest those that must be nesed
nested_vars <- tmp_lst[.vars_to_nest] %>%
# We might have selected something that does not exist we better clear
away
compact() %>%
# Do the nesting
map2(names(.), function(.df, .id){
nest(.df, one_of(.id)) %>%
set_names(.id)
})
# Overwrite the list elements with the nested data.frames
tmp_lst[names(nested_vars)] <- nested_vars
tmp_lst %>% bind_cols()
}
people$results %>%
lapply(list_to_df_nest, .vars_to_nest = vars_to_nest) %>%
bind_rows()
The first solution is considerably faster than my second, though everything
might be done in a more clever way...
HTH
Ulrik
On Fri, 4 Aug 2017 at 05:57 Matt Van Scoyoc <scoyoc at gmail.com> wrote:
> I'm having trouble restructuring data from the rwars package into a
> dataframe. Can someone help me?
>
> Here's what I have...
>
> library("rwars")
> library("tidyverse")
>
> # These data are json, so they load into R as a list
> people <- get_all_people(parse_result = T)
> people <- get_all_people(getElement(people, "next"), parse_result = T)
>
> # Look at Anakin Skywalker's data
> people$results[[1]]
> people$results[[1]][1] # print his name
>
> # To use them in R, I need to restructure them to a dataframe like they are
> in dplyr
> data("starwars")
> glimpse(starwars)
>
> Thanks for the help.
>
> Cheers,
> MVS
> =====
> Matthew Van Scoyoc
> =====
> Think SNOW!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list