[R] convert count data to binary data

Marc Schwartz marc_schwartz at me.com
Sat May 7 03:19:41 CEST 2011


On May 6, 2011, at 3:15 PM, Christopher G Oakley wrote:

> Is there a way to generate a new dataframe that produces x lines based on the contents of a column?
> 
> for example: I would like to generate a new dataframe with 70 lines of data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = sum(count).
> 
>> data
> 
> pop fam yesorno count
> 1 126         1    70
> 1 127         1    67
> 1 128         1    75
> 1 126         0    20
> 1 127         0    23
> 1 128         0    15
> 
> 
> Thanks,
> 
> Chris


# Better not to use 'data' as the name of an R object to avoid 
# confusion with certain functions where 'data' is the name of 
# an argument, such as regression models. R is smart enough 
# to generally know the difference, but it can make reading code
# less confusing

> DF
  pop fam yesorno count
1   1 126       1    70
2   1 127       1    67
3   1 128       1    75
4   1 126       0    20
5   1 127       0    23
6   1 128       0    15


Use rep() to generate a vector of repeated indices (?rep):

> rep(1:nrow(DF), DF$count)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [67] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[100] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[133] 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[166] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[199] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[232] 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6
[265] 6 6 6 6 6 6


> table(rep(1:nrow(DF), DF$count))

 1  2  3  4  5  6 
70 67 75 20 23 15 


Now use that vector as input:

DF.New <- DF[rep(1:nrow(DF), DF$count), 1:3]


> str(DF.New)
'data.frame':	270 obs. of  3 variables:
 $ pop    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ fam    : int  126 126 126 126 126 126 126 126 126 126 ...
 $ yesorno: int  1 1 1 1 1 1 1 1 1 1 ...


> with(DF.New, table(fam, yesorno))
     yesorno
fam    0  1
  126 20 70
  127 23 67
  128 15 75


If you might need something more generalized to handle generating 'raw' data of various types from a contingency table, search the list archives for the function "expand.dft", which I posted a few years ago and I think found its way into a couple of CRAN packages.

HTH,

Marc Schwartz



More information about the R-help mailing list