[R] reading and frequency analysis of Spanish text
Michael Friendly
friendly at yorku.ca
Wed Aug 5 20:19:06 CEST 2009
For an historical paper I'm working on, I have some Spanish plaintext,
presently in the form of a Word .doc
file,
http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc
and also some ciphered text from the same original source. The ultimate
goal is to use some
frequency analysis of letters and word lengths in the plaintext to help
decode the ciphered text.
For now, I'm stuck on how to read the Spanish plaintext into R as a text
string, given that it is in a Word .doc file
using some form of latin1 encoding. From Word, I can Save As .. plain
text (.txt), but I'm worried about losing
character encoding information and I don't see anything in the list of
Other encodings presented that seems
helpful.
A naive attempt to read the .doc file directly gives:
> langren.sp.file <-
"http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc"
>
> langren.txt <- scan(langren.sp.file, encoding="latin1")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
scan() expected 'a real', got 'ÐÏࡱá'
>
Can someone help?
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list