[R] Fwd: sqldf not joining all the fields
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Mar 12 19:05:05 CET 2010
That is not so. 842752 does not exist in y$item_code and ASPIRIN has
a code of 22730.
> 842752 %in% y$item_code
[1] FALSE
> subset(y, name == "ASPIRIN")
item_code name formulation_code strength bnf_code
850 22730 ASPIRIN TABS 300MG 4.7.1
855 22780 ASPIRIN PDR NULL 4.7.1
856 22790 ASPIRIN MIXT $ 4.7.1
On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam
<nvanzuydam at gmail.com> wrote:
> ---------- Forwarded message ----------
> From: Natalie Van Zuydam <nvanzuydam at gmail.com>
> Date: Fri, Mar 12, 2010 at 5:49 PM
> Subject: Re: [R] sqldf not joining all the fields
> To: David Winsemius <dwinsemius at comcast.net>
>
>
> Dear David
>
> I'm not sure what the problem is as for every item code there is a
> corresponding information in the y_data. For example 842752 from the x_data
> corresponds to Aspirin in the y_data? Yet when I use sqldf to join the two
> df's I get NA values in the columns from the y_data in z for 842752 item
> code....is there something wrong with my sqldf code or something wrong with
> the way I have inputed the data frames?
>
> Thanks for taking the time to help me,
> Natalie
>
>
>
> On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <dwinsemius at comcast.net>wrote:
>
>> If I assign the file input to y_data and change you sqldf to
>>
>> > z <- sqldf("select * from x_data left join y_data using (item_code)"); z
>>
>> I can replicate your result. Even after changing the types of the two
>> item_code fields to match I still get the same result and when I see to what
>> degree they share values I get:
>>
>> > sum(x_data$item_code %in% y_data$item_code)
>> [1] 2
>> > sum(y_data$item_code %in% x_data$item_code)
>> [1] 2
>>
>>
>> So why are you so sure they are "complete" as you claimed in your first
>> email.
>>
>> --
>> David.
>>
>>
>>
>> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:
>>
>> You have now given two different assignments to x_data and none to y_data:
>>>
>>> The str( from the file access offering:
>>>
>>> > str(x_data)
>>> 'data.frame': 2848 obs. of 5 variables:
>>> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ...
>>> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT
>>> CELLULOSE MESO" ...
>>> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ...
>>> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
>>> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
>>>
>>> The str from assignment from the dput offering
>>> > str(x_data)
>>> 'data.frame': 10 obs. of 10 variables:
>>> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713"
>>> "CAO0000713" ...
>>> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000"
>>> "03/07/2000" ...
>>> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ...
>>> $ item_code : chr "842752" "7800" "842652" "842652" ...
>>> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ...
>>> $ quantity : chr "60" "100G" "60" "60" ...
>>> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ...
>>> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ...
>>> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ...
>>> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ...
>>>
>>> This code "worked", but it is not clear that the x-y assignments were
>>> correct:
>>>
>>> x_data <- read.table(file="
>>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt", header
>>> = TRUE, sep = "|", quote = "\"'",
>>> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
>>> nrows = 3864284,
>>> skip = 0, check.names = TRUE,fill=TRUE,
>>> strip.white = TRUE, blank.lines.skip = TRUE,
>>> comment.char = "#", allowEscapes = FALSE, flush = FALSE,
>>> fileEncoding = "", encoding = "unknown")
>>>
>>> --
>>> David.
>>>
>>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
>>>
>>>
>>>> The y_data file has over 9000 rows in it so I thought it would be more
>>>> practical to give you the file to download....
>>>> --
>>>> View this message in context:
>>>> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list