[R] Expand a contingency table based on the value in one column
Marc Schwartz
marc_schwartz at me.com
Thu Jun 11 19:33:34 CEST 2009
On Jun 11, 2009, at 12:13 PM, Mark Na wrote:
> Hi R-helpers,
>
> I have the following (dummy) dataframe:
>
>> test
> DATE LOCATION KIND CLASS COUNT
> 1 1 1 CAR A 2
> 2 1 1 TRUCK D 3
> 3 1 1 BUS E 4
> 4 1 2 CAR E 2
> 5 1 2 TRUCK A 7
> 6 1 2 BUS F 1
>
> That I would like to turn into this:
>
>> test2
> DATE LOCATION KIND CLASS
> 1 1 1 CAR A
> 2 1 1 CAR A
> 3 1 1 TRUCK D
> 4 1 1 TRUCK D
> 5 1 1 TRUCK D
> 6 1 1 BUS E
> 7 1 1 BUS E
> 8 1 1 BUS E
> 9 1 1 BUS E
> 10 1 2 CAR E
> 11 1 2 CAR E
> 12 1 2 TRUCK A
> 13 1 2 TRUCK A
> 14 1 2 TRUCK A
> 15 1 2 TRUCK A
> 16 1 2 TRUCK A
> 17 1 2 TRUCK A
> 18 1 2 TRUCK A
> 19 1 2 BUS F
>
> So, basically it's a case of expanding (adding rows to) the first
> dataframe
> by the value in the COUNT column.
>
> I have solved this problem with the following code:
>
> test2<-with(test, data.frame(DATE=rep(DATE,COUNT),
> LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT),
> CLASS=rep(CLASS,COUNT)))
>
> but I'm unsatisfied with that solution because it's verbose and I
> think
> there must a more elegant way. If I had more variables than 4 (which
> I do in
> my real data) it would be a nuisance to repeat each column within
> the rep
> function.
>
> I would prefer to do this with Base R or package(reshape) than
> relying on
> another package.
>
> Any ideas? Thanks!
>
> Mark Na
Mark,
A quick and dirty solution:
> test[rep(1:nrow(test), test$COUNT), -ncol(test)]
DATE LOCATION KIND CLASS
1 1 1 CAR A
1.1 1 1 CAR A
2 1 1 TRUCK D
2.1 1 1 TRUCK D
2.2 1 1 TRUCK D
3 1 1 BUS E
3.1 1 1 BUS E
3.2 1 1 BUS E
3.3 1 1 BUS E
4 1 2 CAR E
4.1 1 2 CAR E
5 1 2 TRUCK A
5.1 1 2 TRUCK A
5.2 1 2 TRUCK A
5.3 1 2 TRUCK A
5.4 1 2 TRUCK A
5.5 1 2 TRUCK A
5.6 1 2 TRUCK A
6 1 2 BUS F
For a more general solution to taking a tabulated data frame and
converting it back to the raw data see my expand.dft() function:
https://stat.ethz.ch/pipermail/r-help/2009-January/185561.html
For example:
> expand.dft(test, freq = "COUNT")
DATE LOCATION KIND CLASS
1 1 1 CAR A
2 1 1 CAR A
3 1 1 TRUCK D
4 1 1 TRUCK D
5 1 1 TRUCK D
6 1 1 BUS E
7 1 1 BUS E
8 1 1 BUS E
9 1 1 BUS E
10 1 2 CAR E
11 1 2 CAR E
12 1 2 TRUCK A
13 1 2 TRUCK A
14 1 2 TRUCK A
15 1 2 TRUCK A
16 1 2 TRUCK A
17 1 2 TRUCK A
18 1 2 TRUCK A
19 1 2 BUS F
HTH,
Marc Schwartz
More information about the R-help
mailing list