[R] Expand a contingency table based on the value in one column

Thu Jun 11 19:33:34 CEST 2009

On Jun 11, 2009, at 12:13 PM, Mark Na wrote:

> Hi R-helpers,
>
> I have the following (dummy) dataframe:
>
>> test
>  DATE LOCATION  KIND CLASS COUNT
> 1    1        1   CAR     A     2
> 2    1        1 TRUCK     D     3
> 3    1        1   BUS     E     4
> 4    1        2   CAR     E     2
> 5    1        2 TRUCK     A     7
> 6    1        2   BUS     F     1
>
> That I would like to turn into this:
>
>> test2
>   DATE LOCATION  KIND CLASS
> 1     1        1   CAR     A
> 2     1        1   CAR     A
> 3     1        1 TRUCK     D
> 4     1        1 TRUCK     D
> 5     1        1 TRUCK     D
> 6     1        1   BUS     E
> 7     1        1   BUS     E
> 8     1        1   BUS     E
> 9     1        1   BUS     E
> 10    1        2   CAR     E
> 11    1        2   CAR     E
> 12    1        2 TRUCK     A
> 13    1        2 TRUCK     A
> 14    1        2 TRUCK     A
> 15    1        2 TRUCK     A
> 16    1        2 TRUCK     A
> 17    1        2 TRUCK     A
> 18    1        2 TRUCK     A
> 19    1        2   BUS     F
>
> So, basically it's a case of expanding (adding rows to) the first  
> dataframe
> by the value in the COUNT column.
>
> I have solved this problem with the following code:
>
> test2<-with(test, data.frame(DATE=rep(DATE,COUNT),
> LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT),  
> CLASS=rep(CLASS,COUNT)))
>
> but I'm unsatisfied with that solution because it's verbose and I  
> think
> there must a more elegant way. If I had more variables than 4 (which  
> I do in
> my real data) it would be a nuisance to repeat each column within  
> the rep
> function.
>
> I would prefer to do this with Base R or package(reshape) than  
> relying on
> another package.
>
> Any ideas? Thanks!
>
> Mark Na

Mark,

A quick and dirty solution:

 > test[rep(1:nrow(test), test$COUNT), -ncol(test)]
     DATE LOCATION  KIND CLASS
1      1        1   CAR     A
1.1    1        1   CAR     A
2      1        1 TRUCK     D
2.1    1        1 TRUCK     D
2.2    1        1 TRUCK     D
3      1        1   BUS     E
3.1    1        1   BUS     E
3.2    1        1   BUS     E
3.3    1        1   BUS     E
4      1        2   CAR     E
4.1    1        2   CAR     E
5      1        2 TRUCK     A
5.1    1        2 TRUCK     A
5.2    1        2 TRUCK     A
5.3    1        2 TRUCK     A
5.4    1        2 TRUCK     A
5.5    1        2 TRUCK     A
5.6    1        2 TRUCK     A
6      1        2   BUS     F

For a more general solution to taking a tabulated data frame and  
converting it back to the raw data see my expand.dft() function:

   https://stat.ethz.ch/pipermail/r-help/2009-January/185561.html

For example:

 > expand.dft(test, freq = "COUNT")
    DATE LOCATION  KIND CLASS
1     1        1   CAR     A
2     1        1   CAR     A
3     1        1 TRUCK     D
4     1        1 TRUCK     D
5     1        1 TRUCK     D
6     1        1   BUS     E
7     1        1   BUS     E
8     1        1   BUS     E
9     1        1   BUS     E
10    1        2   CAR     E
11    1        2   CAR     E
12    1        2 TRUCK     A
13    1        2 TRUCK     A
14    1        2 TRUCK     A
15    1        2 TRUCK     A
16    1        2 TRUCK     A
17    1        2 TRUCK     A
18    1        2 TRUCK     A
19    1        2   BUS     F

HTH,

Marc Schwartz