[R] matching similar character strings
A M Lavezzi
mario.lavezzi at unipa.it
Fri Jun 21 11:56:27 CEST 2013
Hello everybody
I have this problem: I need to match an addresses database F1 with the
information contained in a toponymic database F2.
The format of F1 is given by three columns and 800 rows, with the
columns being:
A1. Street/Road/Avenue
A2. Name
A3. Number
Consider for instance Avenue J. Kennedy , 3011. In F1 this is:
A1. Avenue
A2. J. Kennedy
A3. 3011
The format of F2 file is instead given by 20000 rows and five columns:
B1. Street/Road/Avenue
B2. Name
B3. Starting Street Number
B4. Ending Street Number
B5. Census section
So my problem is attributing the B5 Census section to every
observation of F1 if: A1=B1, A2=B2, and A3 is comprised between B3 and
B4.
The problem is that while the information in A2 is irregularly
recorded, B2 has a given format that is Family name (space) Given
name.
So I could have that while in B2 the information is:
Kennedy John
In A2 it could be:
John Kennedy
JF Kennedy
J. Kennedy
and so on.
Thanks,
Mario
--
Andrea Mario Lavezzi
Dipartimento di Scienze Giuridiche, della Società e dello Sport
Sezione Diritto e Società
Università di Palermo
Piazza Bologni 8
90134 Palermo, Italy
tel. ++39 091 23892208
fax ++39 091 6111268
skype: lavezzimario
email: mario.lavezzi (at) unipa.it
web: http://www.unipa.it/~mario.lavezzi
More information about the R-help
mailing list