[R] Converting english words to numeric equivalents
Hans-Joerg Bibiko
bibiko at eva.mpg.de
Mon Jul 28 12:37:36 CEST 2008
On 28 Jul 2008, at 12:23, Hans-Joerg Bibiko wrote:
> How about this?
>
> unletter <- function(word) {
> gsub('-64',' ',paste(sprintf("%02d",utf8ToInt(tolower(word)) -
> 96),collapse=''))
> }
>
> unletter("abc")
> [1] "010203"
>
> unletter("Aw")
> [1] "0123"
>
> unletter("I walk to school")
> [1] "09 23011211 2015 190308151512"
I do not know precisely what do you want to do.
With:
as.double(unlist(strsplit(unletter("I walk to school")," ")))
you will get a numeric vector out of the string.
But this leads to a problem with large words like:
as.double(unlist(strsplit(unletter("schoolschool")," ")))
[1] 1.903082e+23
Thus I would suggest if there's a need to mirror words as numeric
values and the numeric values haven't a meaning to parse your text in
beforehand to build a hash (a list) of all distinct words in your text
and assign a number to each word.
This would end up in a list à la:
words <- ("abc" = 1, "I" = 2, "go" = 3, etc.)
After that you can access these numeric values via:
words['go']
$go
[1] 3
--Hans
More information about the R-help
mailing list