[R] Hashing and environments

Levy, Roger rlevy at ucsd.edu
Sat Nov 6 21:38:33 CET 2010


Hi,

I'm trying to write a general-purpose "lexicon" class and associated methods for storing and accessing information about large numbers of specific words (e.g., their frequencies in different genres).  Crucial to making such a class practically useful is to get hashing working correctly so that information about specific words can be accessed quickly.  But I've never really understood very well how hashing works, so I'm having trouble.

Here is an example of what I've done so far:

***

setClass("Lexicon",representation(e="environment"))
setMethod("initialize","Lexicon",function(.Object,wfreqs) {
	.Object at e <- new.env(hash=T,parent=emptyenv())
	assign("wfreqs",wfreqs,envir=.Object at e)
	return(.Object)
	})

## function to access word frequencies
wfreq <- function(lexicon,word) {
	return(get("wfreqs",envir=lexicon at e)[word])
}

## example of use
my.lexicon <- new("Lexicon",wfreqs=c("the"=2,"person"=1))
wfreq(my.lexicon,"the")

***

However, testing indicates that the way I have set this up does not achieve the intended benefits of having the environment hashed:

***

sample.wfreqs <- trunc(runif(1e5,max=100))
names(sample.wfreqs) <- as.character(1:length(sample.wfreqs))
lex <- new("Lexicon",wfreqs=sample.wfreqs)
words.to.lookup <- trunc(runif(100,min=1,max=1e5))
## look up the words directly from the sample.wfreqs vector
system.time({
	for(i in words.to.lookup)
		sample.wfreqs[as.character(i)]
	},gcFirst=TRUE)
## look up the words through the wfreq() function; time approx the same
system.time({
	for(i in words.to.lookup)
		wfreq(lex,as.character(i))
	},gcFirst=TRUE)

***

I'm guessing that the problem is that the indexing of the wfreqs vector in my wfreq() function is not happening inside the actual lexicon's environment.  However, I have not been able to figure out the proper call to get the lookup to happen inside the lexicon's environment.  I've tried

wfreq1 <- function(lexicon,word) {
	return(eval(wfreqs[word],envir=lexicon at e))
}

which I'd thought should work, but this gives me an error:

> wfreq1(my.lexicon,'the')
Error in eval(wfreqs[word], envir = lexicon at e) : 
  object 'wfreqs' not found

Any advice would be much appreciated!

Best & many thanks in advance,

Roger

--

Roger Levy                      Email: rlevy at ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy



More information about the R-help mailing list