[R] The KJV

(Ted Harding) Ted.Harding at manchester.ac.uk
Sun Feb 7 09:28:34 CET 2010


On 07-Feb-10 01:06:40, Ben Bolker wrote:
> Jim Lemon <jim <at> bitwrit.com.au> writes:
> 
>> 
>> On 02/06/2010 06:57 PM, Charlotte Maia wrote:
>> > Hey all,
>> >
>> > Does anyone know if there are any R packages with a copy of the KJV?
>> > I'm guessing the answer is no...
>> >
>> > So the next question, and the more important one is:
>> > Does anyone think it would be useful (e.g. for text-mining
>> > purposes)?
>> > I know almost nothing about theology,
>> > so I'm not sure what kind of questions theologists might have (that
>> > R
>> > could answer).
>> >
>> > An alternative, that would achieve a similar result (I think),
>> > would be an R interface to another open source system, such as
>> > Sword.
>> >
>> Hi Charlotte,
>> Try
>> 
>> http://www.gutenberg.org/etext/10
>> 
>> Jim
>> 
> 
>  I couldn't help it:
> 
> x <- url("http://www.gutenberg.org/dirs/etext90/kjv10.txt",open="r")
> X <- readLines(x,n=20000)
> z <- grep("First Book of Moses",X)
> X <- X[-(1:z)]
> X <- X[nchar(X)>0]
> length(X) ## 15058
> words <- tolower(unlist(strsplit(X,"[ .,:;()]")))
> words2 <- grep("[^0-9]",words,value=TRUE)
> tt <- rev(sort(table(words2)))
> barplot(rev(tt[1:100]),horiz=TRUE,las=1,cex.names=0.4,log="x")

Delightful! And fascinating in the detail too.

  length(tt)
  # [1] 5078

with slight changes like:

  barplot(rev(tt[1:50]),horiz=TRUE,las=1,cex.names=0.6,log="x")
  # ...
  barplot(rev(tt[101:150]),horiz=TRUE,las=1,cex.names=0.6,log="x")
  # ...

and see the likes of

  tt["lord"]
  # lord 
  # 1939 

  tt["god"]
  # god 
  # 822 

  tt["men"]
  # men 
  # 204 

  tt["women"]
  # women 
  #    26 

I'm now wondering how it matches up with Zipf's Law (or perhaps
Fisher's logarithmic ... )

Thanks, Ben!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Feb-10                                       Time: 08:28:30
------------------------------ XFMail ------------------------------



More information about the R-help mailing list