[R] Data Structure to Unnest_tokens in tidytext package

Eric Berger er|cjberger @end|ng |rom gm@||@com
Wed Dec 11 16:22:56 CET 2019


Hi Sarah,
I looked at the documentation that you linked to. It contains the step

text_df <- tibble(line = 1:4, text = text)

before it does the step

text_df %>%
  unnest_tokens(word, text)

So you may be missing a step.

Best,
Eric

On Tue, Dec 10, 2019 at 9:05 PM Sarah Payne <spaynebu using gmail.com> wrote:
>
> Hi--I'm fairly new to R and trying to do a text mining project on a novel
> using the tidytext package. The novel is saved as a plain text document and
> I can import it into RStudio just fine. For reference I'm trying to do
> something similar to section 1.3 of this tidy text tutorial
> <https://www.tidytextmining.com/tidytext.html>, except I'm working with one
> novel instead of many. So I import the novel and then run:
>
> "tidy_novel <- quicksandr %>%
> unnest_tokens (word, text)"
>
> I get the following error:
>
> Error in check_input(x) :
>   Input must be a character vector of any length or a list of character
>   vectors, each of which has a length of 1.
>
> typeof(novel) returns "list" and str(novel) returns
>
> Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 955 obs. of  1
> variable:
>  $ FOR E. S. I.: chr  "FOR E. S. I." "My old man died in a fine big house.
> My ma died in a shack. I wonder where I'm gonna die, Being neither white
> nor black?'" "LANGSTON HUGHES" "ONE" ...
>  - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8 obs. of
>  5 variables:
>   ..$ row     : int  530 726 733 836 853 886 889 942
>   ..$ col     : chr  NA NA NA NA ...
>   ..$ expected: chr  "1 columns" "1 columns" "1 columns" "1 columns" ...
>   ..$ actual  : chr  "2 columns" "2 columns" "2 columns" "2 columns" ...
>   ..$ file    : chr  "'quicksandr.txt'" "'quicksandr.txt'"
> "'quicksandr.txt'" "'quicksandr.txt'" ...
>  - attr(*, "spec")=
>   .. cols(
>   ..   `FOR E. S. I.` = col_character()
>   .. )
> >
>
> I'm just importing the text file and then trying to run the unnest_tokens
> function, so maybe I'm missing a step in between? I seem to need my text
> file in a different format, so would appreciate answers on how to do that.
> Thanks, and let me know if I need to provide more info!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list