[R] Unlisting a nested dataset

Nathan Parsons n@th@n@f@p@r@on@ @ending from gm@il@com
Tue Oct 16 04:55:59 CEST 2018


I’m attempting to do some content analysis on a few million tweets, but I can’t seem to get them cleaned correctly.

I’m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens

My code:

tweets %>%
 unnest_tokens(word, text, token = 'tweets') %>%
 filter(!word %in% stop_words$word) %>%
 nest(word) %>%
 mutate(text = map(data, unlist),
           text = map_chr(text, paste, collapse = " ")) -> tweets

Unfortunately, I keep getting:

 Error in mutate_impl(.data, dots) :
 Evaluation error: cannot coerce type 'closure' to vector of type 'character’.

What am I doing wrong?

Here’s what the dataset looks like:

> glimpse(tweets)
Observations: 389,253
Variables: 12
$ status_id "x1047841705729306624", "x1046966595610927105", "x104709...
$ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10...
$ text "Technique is everything with olympic lifts ! @ Body By ...
$ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937...
$ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,...
$ county_name "Cumberland County", "Delaware County", "San Francisco C...
$ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482...
$ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ...
$ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A...
$ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met...
$ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,...
$ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,...

--

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

503-725-9025
503-725-3957 FAX

	[[alternative HTML version deleted]]



More information about the R-help mailing list