[R] Label rows of table by factor level for groups of factors

Sarah Goslee sarah.goslee at gmail.com
Tue Mar 6 19:44:59 CET 2012


On Tue, Mar 6, 2012 at 1:32 PM, O'Hanlon, Simon J
<simon.ohanlon at imperial.ac.uk> wrote:
> Ah!
>
> Thanks.
>
> I had already made vector x2 previously and then went and changed it for some reason, which was why I didn't notice the error (because the subsequent code was able to run regardless). Sorry about that.
>
> so x2 should have read x2=c(rep(1,12)) which is what I originally had and what I was basing my plea for help on.

That would explain the difference in results. Regardless, the method I
suggested should work.


x1=c(rep(0:1,6))
x2=c(rep(1,12))
x3=c(rep(1,6),rep(0,6))
df=data.frame(x1,x2,x3)
tabledf=as.data.frame(with(df, table(x1,x2,x3)))

tabledf <- cbind(tabledf, res=1:nrow(tabledf))
newdf <- merge(df, tabledf)

Note that row order is not preserved; if you need that you can
add an id column to df before merging and sort on it after.

Please notice also that I've been copying the R-help list on my
replies, so that other people who either have similar questions or
might be moved to help can see what we've been discussing.

Sarah

> ________________________________________
> From: Sarah Goslee [sarah.goslee at gmail.com]
> Sent: 06 March 2012 18:27
> To: O'Hanlon, Simon J; r-help
> Subject: Re: [R] Label rows of table by factor level for groups of factors
>
> Well, if you can get this to run your version of R is markedly\
> different than mine.
>
>> #Start of code
>>
>> x1=c(rep(0:1,6))
>> x2=c(rep(c(1,1,0,0)6))
> Error: unexpected numeric constant in "x2=c(rep(c(1,1,0,0)6"
>> x3=c(rep(1,6),rep(0,6))
>
>
>
> On Tue, Mar 6, 2012 at 1:23 PM, O'Hanlon, Simon J
> <simon.ohanlon at imperial.ac.uk> wrote:
>> Hi Sarah,
>> Thanks a lot for your suggestion. I'll give it a go if I can (I just spent the last 3 hours using unique record filtering and vlookups in Excel to achieve what I'm sure can be accomplished in 3 or 4 lines of R code!).
>>
>> I think you might want to run the sample code again though. I just tried it (and there was no missing comma) and I get:
>>
>>   x1 x2 x3
>> 1   0  1  1
>> 2   1  1  1
>> 3   0  1  1
>> 4   1  1  1
>> 5   0  1  1
>> 6   1  1  1
>> 7   0  1  0
>> 8   1  1  0
>> 9   0  1  0
>> 10  1  1  0
>> 11  0  1  0
>> 12  1  1  0
>>> tabledf
>>  x1 x2 x3 Freq
>> 1  0  1  0    3
>> 2  1  1  0    3
>> 3  0  1  1    3
>> 4  1  1  1    3
>>> desired
>>   x1 x2 x3 res
>> 1   0  1  1   3
>> 2   1  1  1   4
>> 3   0  1  1   3
>> 4   1  1  1   4
>> 5   0  1  1   3
>> 6   1  1  1   4
>> 7   0  1  0   1
>> 8   1  1  0   2
>> 9   0  1  0   1
>> 10  1  1  0   2
>> 11  0  1  0   1
>> 12  1  1  0   2
>>> nrow(tabledf)
>> [1] 4
>>> dim(tabledf)
>> [1] 4 4
>>
>> #Start of code
>>
>> x1=c(rep(0:1,6))
>> x2=c(rep(c(1,1,0,0)6))
>> x3=c(rep(1,6),rep(0,6))
>> df=data.frame(x1,x2,x3)
>> tabledf=as.data.frame(with(df, table(x1,x2,x3)))
>> res=c(3,4,3,4,3,4,1,2,1,2,1,2)
>> desired=data.frame(x1,x2,x3,res)
>> df
>> tabledf
>> desired
>>
>> #End of code
>>
>> Cheers!
>>
>> Simon
>>
>> --------------------------------
>> Simon O'Hanlon, BSc MSc
>> Department of Infectious Disease Epidemiology
>> Imperial College London
>> St. Mary's Hospital
>> London
>> W2 1PG
>> ________________________________________
>> From: Sarah Goslee [sarah.goslee at gmail.com]
>> Sent: 06 March 2012 18:16
>> To: O'Hanlon, Simon J
>> Cc: r-help at R-project.org
>> Subject: Re: [R] Label rows of table by factor level for groups of factors
>>
>> One possible approach is to use unique() to get the list of distinct
>> combinations, cbind() an identifying variable to that list, then use
>> merge() to join it to your existing data frame.
>>
>> But I'm not seeing how you are getting four unique combinations.
>> Given your sample data (with the missing comma replaced):
>>> dim(tabledf)
>> [1] 8 4
>>> head(desired)
>>  x1 x2 x3 res
>> 1  0  1  1   3
>> 2  1  1  1   4
>> 3  0  0  1   3
>> 4  1  0  1   4
>> 5  0  1  1   3
>> 6  1  1  1   4
>>
>> tabledf has 8 rows, not 4, and I don't see how rows 1 and 3
>> or rows 2 and 4 of your desired df should get the same
>> classification.
>>
>> Regardless, if you can make a data frame like tabledf with
>> an additional column for your desired res variable, you can
>> merge() it with your original data frame.
>>
>> Sarah
>>
>> On Tue, Mar 6, 2012 at 11:06 AM, O'Hanlon, Simon J
>> <simon.ohanlon at imperial.ac.uk> wrote:
>>> Dear useRs,
>>> I am sure this is a fairly simple problem, but I just cannot get my head around it.
>>>
>>>
>>> I have a dataframe which contains several factor variables. I can use table() to tell me how many different combinations there are of these variables. What I should like to do is to add a column to my original dataframe which labels each row according to the unique combination of factors.
>>>
>>>
>>> E.g. in the simple example below I create a dataframe 'df' with 3 columns, the values of which take 0 or 1. I can then classify each row in the table and I find that I have 4 unique combinations of factors. I would now like to add a fourth column to df which labels each row according to whether it was unique combination 1,2,3 or 4:
>>>
>>> x1=c(rep(0:1,6))
>>> x2=c(rep(c(1,1,0,0)6))
>>> x3=c(rep(1,6),rep(0,6))
>>> df=data.frame(x1,x2,x3)
>>> tabledf=as.data.frame(with(df, table(x1,x2,x3)))
>>> res=c(3,4,3,4,3,4,1,2,1,2,1,2)
>>> desired=data.frame(x1,x2,x3,res)
>>> df
>>> tabledf
>>> desired
>>>
>>>
>>> I realise that this is probably quite simple to do, I am just struggling to get my head around it! Help much appreciated in advance.
>>>



More information about the R-help mailing list