[R] reading in MS Word files

Ingo Feinerer feinerer at logic.at
Tue Aug 18 16:56:32 CEST 2009


On Tue, Aug 18, 2009 at 12:00:07PM +0200, Mark Kimpel wrote:
> I am familiar with packages that read and write Excel files on both Windows
> and Linux platforms.
> 
> Do any packages provide similar functionality for MS Word files? I have a
> lot of text processing to do and the text is embedded in ~200 different Word
> files (.doc format Office 2003). All I need to do is read, not write.

See readDOC in package tm. E.g., something like

Corpus(DirSource("aDirectoryContainingTheWordFiles"), readerControl = list(reader = readDOC))

Note that you need antiword (http://www.winfield.demon.nl/) in your
path such that readDOC can use it.

Best regards, Ingo

-- 
Ingo Feinerer
Vienna University of Technology
http://www.dbai.tuwien.ac.at/staff/feinerer




More information about the R-help mailing list