[R] convert count data to binary data
Marc Schwartz
marc_schwartz at me.com
Sat May 7 03:19:41 CEST 2011
On May 6, 2011, at 3:15 PM, Christopher G Oakley wrote:
> Is there a way to generate a new dataframe that produces x lines based on the contents of a column?
>
> for example: I would like to generate a new dataframe with 70 lines of data[1, 1:3], 67 lines of data[2, 1:3], 75lines of data[3,1:3] and so on up to numrow = sum(count).
>
>> data
>
> pop fam yesorno count
> 1 126 1 70
> 1 127 1 67
> 1 128 1 75
> 1 126 0 20
> 1 127 0 23
> 1 128 0 15
>
>
> Thanks,
>
> Chris
# Better not to use 'data' as the name of an R object to avoid
# confusion with certain functions where 'data' is the name of
# an argument, such as regression models. R is smart enough
# to generally know the difference, but it can make reading code
# less confusing
> DF
pop fam yesorno count
1 1 126 1 70
2 1 127 1 67
3 1 128 1 75
4 1 126 0 20
5 1 127 0 23
6 1 128 0 15
Use rep() to generate a vector of repeated indices (?rep):
> rep(1:nrow(DF), DF$count)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[34] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[67] 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[100] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[133] 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[166] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[199] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[232] 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6
[265] 6 6 6 6 6 6
> table(rep(1:nrow(DF), DF$count))
1 2 3 4 5 6
70 67 75 20 23 15
Now use that vector as input:
DF.New <- DF[rep(1:nrow(DF), DF$count), 1:3]
> str(DF.New)
'data.frame': 270 obs. of 3 variables:
$ pop : int 1 1 1 1 1 1 1 1 1 1 ...
$ fam : int 126 126 126 126 126 126 126 126 126 126 ...
$ yesorno: int 1 1 1 1 1 1 1 1 1 1 ...
> with(DF.New, table(fam, yesorno))
yesorno
fam 0 1
126 20 70
127 23 67
128 15 75
If you might need something more generalized to handle generating 'raw' data of various types from a contingency table, search the list archives for the function "expand.dft", which I posted a few years ago and I think found its way into a couple of CRAN packages.
HTH,
Marc Schwartz
More information about the R-help
mailing list