[R] Creating binary variable depending on strings of two dataframes

David Winsemius dwinsemius at comcast.net
Tue May 10 15:09:55 CEST 2011


On May 10, 2011, at 3:18 AM, noxyport at gmail.com wrote:

> On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On May 6, 2011, at 11:35 AM, Pete Pete wrote:
>>
>>>
>>> Gabor Grothendieck wrote:
>>>>
>>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete  
>>>> <noxyport at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> consider the following two dataframes:
>>>>> x1=c("232","3454","3455","342","13")
>>>>> x2=c("1","1","1","0","0")
>>>>> data1=data.frame(x1,x2)
>>>>>
>>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>>>> data2=data.frame(y1,y2)
>>>>>
>>>>> I need a new column in dataframe data1 (x3), which is either 0  
>>>>> or 1
>>>>> depending if the value "E1" in y2 of data2 is true while x1=y1.  
>>>>> The
>>>>> result
>>>>> of data1 should look like this:
>>>>>  x1     x2 x3
>>>>> 1 232   1   1
>>>>> 2 3454 1   1
>>>>> 3 3455 1   0
>>>>> 4 342   0   0
>>>>> 5 13     0   1
>>>>>
>>>>> I think a SQL command could help me but I am too inexperienced  
>>>>> with it
>>>>> to
>>>>> get there.
>>>>>
>>>>
>>>> Try this:
>>>>
>>>>> library(sqldf)
>>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join  
>>>>> data2 d2
>>>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>>>
>>>>   x1 x2 x3
>>>> 1  232  1  1
>>>> 2 3454  1  1
>>>> 3 3455  1  0
>>>> 4  342  0  0
>>>> 5   13  0  1
>>>>
>>>>
>> snipped Gabor's sig
>>>
>>> That works pretty cool but I need to automate this a bit more.  
>>> Consider
>>> the
>>> following example:
>>>
>>> list1=c("A01","B04","A64","G84","F19")
>>>
>>> x1=c("232","3454","3455","342","13")
>>> x2=c("1","1","1","0","0")
>>> data1=data.frame(x1,x2)
>>>
>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
>>> data2=data.frame(y1,y2)
>>>
>>> I want now to creat a loop, which creates for every value in list1  
>>> a new
>>> binary variable in data1. Result should look like:
>>> x1      x2      A01     B04     A64     G84     F19
>>> 232     1       0       1       0       0       0
>>> 3454    1       0       0       1       0       1
>>> 3455    1       0       0       0       0       0
>>> 342     0       0       0       0       0       0
>>> 13      0       1       0       0       1       1
>>
>> Loops!?! We don't nee no steenking loops!
>>
>>> xtb <-  with(data2, table(y1,y2))
>>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
>>       x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
>> 232   232  1   0   0   1   1   0   0   0   0   0
>> 3454 3454  1   0   1   0   0   0   1   0   0   0
>> 3455 3455  1   0   0   0   0   1   0   0   0   0
>> 342   342  0   0   0   0   0   0   0   0   0   1
>> 13     13  0   1   0   0   0   0   1   1   1   0
>>
>> I am guessing that you were to ... er, busy? ... to complete the  
>> table?
>>
>> --
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> Thanks a lot! Pretty simple. I am so much used to SQLDF right now.
>
> So how would you handle more complicated strings like that:
> y1=c("232","232", "232",  
> "3454","3454","3455","342","13","13","13","13")
> y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
> C35","F68","G84","F19","A01")
> data2=data.frame(y1,y2)
>
> Where you want to extract for instance all "A01" from the strings?

I think you need either to explain what you want in more words of the  
English language or to offer an example of the desired output. I  
suspect you did not want something as simple as this:

 > A01.instances <- grep("A01" , data2$y2)
 > A01.instances
[1]  2 11
 > data2[A01.instances, ]
     y1          y2
2  232 B04 A01 F19
11  13         A01

Or maybe you did?

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list