[R] Creating binary variable depending on strings of two dataframes
noxyport at gmail.com
noxyport at gmail.com
Tue May 10 09:18:47 CEST 2011
On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
>
>>
>> Gabor Grothendieck wrote:
>>>
>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>> consider the following two dataframes:
>>>> x1=c("232","3454","3455","342","13")
>>>> x2=c("1","1","1","0","0")
>>>> data1=data.frame(x1,x2)
>>>>
>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>>> data2=data.frame(y1,y2)
>>>>
>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
>>>> result
>>>> of data1 should look like this:
>>>> x1 x2 x3
>>>> 1 232 1 1
>>>> 2 3454 1 1
>>>> 3 3455 1 0
>>>> 4 342 0 0
>>>> 5 13 0 1
>>>>
>>>> I think a SQL command could help me but I am too inexperienced with it
>>>> to
>>>> get there.
>>>>
>>>
>>> Try this:
>>>
>>>> library(sqldf)
>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 d2
>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>>
>>> x1 x2 x3
>>> 1 232 1 1
>>> 2 3454 1 1
>>> 3 3455 1 0
>>> 4 342 0 0
>>> 5 13 0 1
>>>
>>>
> snipped Gabor's sig
>>
>> That works pretty cool but I need to automate this a bit more. Consider
>> the
>> following example:
>>
>> list1=c("A01","B04","A64","G84","F19")
>>
>> x1=c("232","3454","3455","342","13")
>> x2=c("1","1","1","0","0")
>> data1=data.frame(x1,x2)
>>
>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
>> data2=data.frame(y1,y2)
>>
>> I want now to creat a loop, which creates for every value in list1 a new
>> binary variable in data1. Result should look like:
>> x1 x2 A01 B04 A64 G84 F19
>> 232 1 0 1 0 0 0
>> 3454 1 0 0 1 0 1
>> 3455 1 0 0 0 0 0
>> 342 0 0 0 0 0 0
>> 13 0 1 0 0 1 1
>
> Loops!?! We don't nee no steenking loops!
>
>> xtb <- with(data2, table(y1,y2))
>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
> x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
> 232 232 1 0 0 1 1 0 0 0 0 0
> 3454 3454 1 0 1 0 0 0 1 0 0 0
> 3455 3455 1 0 0 0 0 1 0 0 0 0
> 342 342 0 0 0 0 0 0 0 0 0 1
> 13 13 0 1 0 0 0 0 1 1 1 0
>
> I am guessing that you were to ... er, busy? ... to complete the table?
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
Thanks a lot! Pretty simple. I am so much used to SQLDF right now.
So how would you handle more complicated strings like that:
y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2)
Where you want to extract for instance all "A01" from the strings?
More information about the R-help
mailing list