[R] The KJV
(Ted Harding)
Ted.Harding at manchester.ac.uk
Sun Feb 7 09:28:34 CET 2010
On 07-Feb-10 01:06:40, Ben Bolker wrote:
> Jim Lemon <jim <at> bitwrit.com.au> writes:
>
>>
>> On 02/06/2010 06:57 PM, Charlotte Maia wrote:
>> > Hey all,
>> >
>> > Does anyone know if there are any R packages with a copy of the KJV?
>> > I'm guessing the answer is no...
>> >
>> > So the next question, and the more important one is:
>> > Does anyone think it would be useful (e.g. for text-mining
>> > purposes)?
>> > I know almost nothing about theology,
>> > so I'm not sure what kind of questions theologists might have (that
>> > R
>> > could answer).
>> >
>> > An alternative, that would achieve a similar result (I think),
>> > would be an R interface to another open source system, such as
>> > Sword.
>> >
>> Hi Charlotte,
>> Try
>>
>> http://www.gutenberg.org/etext/10
>>
>> Jim
>>
>
> I couldn't help it:
>
> x <- url("http://www.gutenberg.org/dirs/etext90/kjv10.txt",open="r")
> X <- readLines(x,n=20000)
> z <- grep("First Book of Moses",X)
> X <- X[-(1:z)]
> X <- X[nchar(X)>0]
> length(X) ## 15058
> words <- tolower(unlist(strsplit(X,"[ .,:;()]")))
> words2 <- grep("[^0-9]",words,value=TRUE)
> tt <- rev(sort(table(words2)))
> barplot(rev(tt[1:100]),horiz=TRUE,las=1,cex.names=0.4,log="x")
Delightful! And fascinating in the detail too.
length(tt)
# [1] 5078
with slight changes like:
barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log="x")
# ...
barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log="x")
# ...
and see the likes of
tt["lord"]
# lord
# 1939
tt["god"]
# god
# 822
tt["men"]
# men
# 204
tt["women"]
# women
# 26
I'm now wondering how it matches up with Zipf's Law (or perhaps
Fisher's logarithmic ... )
Thanks, Ben!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Feb-10 Time: 08:28:30
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list