[R] Help request: Parsing docx files for key words and appending to a spreadsheet

Andy ph@edru@v @end|ng |rom gm@||@com
Sat Dec 30 12:44:50 CET 2023

Thanks Ivan and Calum

I continue to appreciate your support.

Calum, I entered the code snippet you provided, and it returns 'file 
missing'. Looking at this, while the object 'full_filename' exists, what 
is happening is that the path from getwd() is being appended to the 
title of the article, but without the '/' between the end of the path 
name (here 'TEST' and the name of the article. In other words, 
full_filename is reading "~/TESTNow they want us to charge our electric 
cars from litter bins.docx", so logically, this file doesn't exist. To 
work, the '/' needs to be inserted to differentiate between the end of 
the path name and the start of the article name. I've tried both paste0, 
as you suggested, and paste but neither do the trick.

Is this a result of me using the tkinter folder selection that you 
remarked on? I wanted to keep that so that the selection is interactive, 
but if there are better ways of doing this I am open to suggestions.

Thanks again, both.

Best wishes

On 29/12/2023 22:25, CALUM POLWART wrote:
>     help(read_docx) says that the function only imports one docx file. In
>     order to read multiple files, use a for loop or the lapply function.
> I told you people will suggest better ways to loop!!
>     docx_summary(read_docx("Now they want us to charge our electric cars
>     from litter bins.docx")) should work.
> Ivan thanks for spotting my fail! Since the OP is new to all this I'm 
> going to suggest a little tweak to this code which we can then build 
> into a for loop:
> filepath <- getwd() #you will want to change this later. You are doing 
> something with tcl to pick a directory which seems rather fancy! But 
> keep doing it for now or set the directory here ending in a /
> filename <- "Now they want us to charge our electric cars from litter 
> bins.docx"
> full_filename <- paste0(filepath, filename)
> #lets double check the file does exist!
> if (!file.exists(full_filename)) {
>   message("File missing")
> } else {
>   content <- read_docx(full_filename) |>
>     docx_summary()
>     # this reads docx for the full filename and
>     # passes it ( |> command) to the next line
>     # which summarises it.
>     # the result is saved in a data frame object
>     # called content which we shall show some
>     # heading into from
>    head(content)
> }
> Let's get this bit working before we try and loop

	[[alternative HTML version deleted]]

More information about the R-help mailing list