[R] sqldf not joining all the fields
Newbie19_02
nvanzuydam at gmail.com
Fri Mar 12 17:56:47 CET 2010
Dear R users,
I have two data frames that were read from text files as follows:
x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush = FALSE,
fileEncoding = "", encoding = "unknown")
x_data
prochi prescribed_date dataMonth item_code res_seqno quantity directions
CAO0000713 22/06/2001 NULL 842752 NULL 60 1/D
CAO0000713 28/04/2000 NULL 7800 NULL 100G A/TD
CAO0000713 10/04/2000 NULL 842652 NULL 60 1/D
CAO0000713 03/07/2000 NULL 842652 NULL 60 1/D
CAO0000713 09/01/2001 NULL 842752 NULL 60 1/D
CAO0000713 16/10/2001 NULL 842752 NULL 60 1/D
CAO0000713 16/08/2001 NULL 842752 NULL 60 1/D
CAO0000713 17/09/1993 NULL 39620 NULL 5ML NIL
CAO0000713 01/05/2001 NULL 842752 NULL 60 1/D
CAO0000713 05/03/2001 NULL 842752 NULL 60 1/D
y_data
item_code name formulation_code strength
bnf_code
100 NEONACLEX K TABS NULL 2.2.8
110 NEONACLEX TABS 5MG 2.2.1
50 MESORB DRESS 10CMX10CM 20.3.1
160 ABSORBENT CELLULOSE MESO DRESS 10CMX10CM 20.3.1
161 ABSORBENT CELLULOSE MESO DRESS 10CMX15CM 20.3.1
164 ABSORBENT CELLULOSE MESO DRESS 20CMX25CM 20.3.1
200 SEPTRIN TABS 480MG 5.1.8
210 SEPTRIN PAED SF SUSP 240MG/5ML 5.1.8
212 SEPTRIN ADULT SUSP 480MG/5ML 5.1.8
220 SEPTRIN FORTE TABS 960MG 5.1.8
etc....
contains all the information for the item codes
y was read in in the same way.
I then used the following code:
z <- sqldf("select * from x left join y using (code)")
when I use this on my real data I get an output:
prochi prescribed_date dataMonth item_code res_seqno quantity directions
1 CAO0000713 22/06/2001 NULL 842752 NULL 60
1/D
2 CAO0000713 28/04/2000 NULL 7800 NULL 100G
A/TD
3 CAO0000713 10/04/2000 NULL 842652 NULL 60
1/D
4 CAO0000713 03/07/2000 NULL 842652 NULL 60
1/D
5 CAO0000713 09/01/2001 NULL 842752 NULL 60
1/D
6 CAO0000713 16/10/2001 NULL 842752 NULL 60
1/D
7 CAO0000713 16/08/2001 NULL 842752 NULL 60
1/D
8 CAO0000713 17/09/1993 NULL 39620 NULL 5ML
NIL
9 CAO0000713 01/05/2001 NULL 842752 NULL 60
1/D
10 CAO0000713 05/03/2001 NULL 842752 NULL 60
1/D
no_of_packs datasource scan_ref_no name formulation_code strength
1 NULL TSF NULL <NA> <NA> <NA>
2 NULL TSF NULL BETNOVATE RD OINT 0.025%
3 NULL TSF NULL <NA> <NA> <NA>
4 NULL TSF NULL <NA> <NA> <NA>
5 NULL TSF NULL <NA> <NA> <NA>
6 NULL TSF NULL <NA> <NA> <NA>
7 NULL TSF NULL <NA> <NA> <NA>
8 NULL TSF NULL GAMMABULIN INJ 320MG
9 NULL TSF NULL <NA> <NA> <NA>
10 NULL TSF NULL <NA> <NA> <NA>
bnf_code
1 <NA>
2 13.4.1.2
3 <NA>
4 <NA>
5 <NA>
6 <NA>
7 <NA>
8 14.5
9 <NA>
10 <NA>
There is absolutely no reason for there to be <NA> anywhere as the
information for both the tables is complete.
Not sure what the problem is?
Thanks,
Natalie
--
View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list