[R] create a factor variable from two numeric variables when order is irrelevant

David Winsemius dwinsemius at comcast.net
Tue Jun 28 22:58:05 CEST 2011


On Jun 28, 2011, at 4:53 PM, David Winsemius wrote:

>
> On Jun 28, 2011, at 3:59 PM, Daniel Malter wrote:
>
>> Hi all,
>>
>> I have two numeric variables that form combinations in a matched  
>> sample.
>> Let's say I have five levels of x and y. What I am seeking to  
>> create is a
>> factor variable that ignores the order of x and y, i.e., the factor  
>> should
>> indicate x=1, y=5, as the same factor as x=5, y=1. Obviously, this  
>> becomes
>> increasingly cumbersome to do by hand as the number of levels  
>> increases.
>>
>> f<-1:5
>> x<-sample(f,100,replace=T)
>> y<-sample(f,100,replace=T)
>> d<-matrix(cbind(x,y),ncol=2)
>>
>> #A working solution is to remove the order, multiply one column by  
>> a scaling
>> constant, add the second column, and create the factor for this  
>> numeric
>> value. However, I was wondering whether there is less awkward, more  
>> direct
>> way to do this.
>>
>> i<-apply(t(apply(d,1,function(x) sort(x))),1,function(y)  
>> 10*y[1]+y[2])
>> i<-factor(i)
>> i
>
> I came up with the same solution, but implemented it a bit  
> differently:
>
> > d <- pmin(x,y)+5*pmax(x,y)
>
> > sort(unique(d))
> [1] 11 21 22 31 32 33 41 42 43 44 51 52 53 54 55
>
> > d <- factor(pmin(x,y)+10*pmax(x,y))
> > unique(d)
> [1] 41 42 32 54 51 21 22 33 53 11 31 44 43 52 55
> Levels: 11 21 22 31 32 33 41 42 43 44 51 52 53 54 55

If you wanted a less decimalized version you can paste:

d <- factor(paste(pmax(x,y),pmin(x,y),sep="."))
 > unique(d)
  [1] 4.1 4.2 3.2 5.4 5.1 2.1 2.2 3.3 5.3 1.1 3.1 4.4 4.3 5.2 5.5
Levels: 1.1 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5
 > d[1]
[1] 4.1
Levels: 1.1 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5

>
>
> Seems that you might find the the BioC people doing something  
> isomorphic to this with gene allele pairs using their fancy S4  
> methods.
>
> --


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list