[R] Retrieving original data frame after repetition
Marc Schwartz
marc_schwartz at me.com
Thu Jul 30 21:13:02 CEST 2009
On Jul 30, 2009, at 11:15 AM, Jose Iparraguirre D'Elia wrote:
> Dear R users,
> Consider the first two columns of a data frame like this:
> z[,1:2]
> x y
> 1 1 1
> 2 2 2
> 3 3 3
> 4 1 4
> Imagine that y represents the times that the value x happens in a
> population. But z is not exactly a frequency table, because in z we
> have x=1 twice. So, the x=1 in the first line and the x=1 in the
> fourth are not the same, differing according to a third variable in
> the data frame.
> Now, I use the function rep() in order to obtain a vector of values
> of x in the population:
> x.pop <- rep(x,y)
>> x.pop
> [1] 1 2 2 3 3 3 1 1 1 1
> How can I go from x.pop back to z? If I use table(x.pop), I obtain a
> frequency table like the one below, but not z.
> table(x.pop)
> x.pop
> 1 2 3
> 5 2 3
> (I know I haven't deleted z, obviously, but I need to write a piece
> of code to do something very similar).
> Just in case anyone is wondering by now whether this is an
> assignment for college, etc.,-it is not. The real world problem I'm
> working on at the moment has to do with income distribution in
> Northern Ireland. I want to see how many people would leave poverty
> if the income of those currently below 60% median income increases
> by, say, £20 a week. I am working with the Family Resources Survey
> sample for Northern Ireland (n=2,263), which I have to gross up
> before increasing the incomes (grossed up n=1,712,886). Once I
> increased the income figures for those individuals in poverty, I
> need to 'un-gross' the data to get back to n=2,263 -and table()
> simply does not do the trick, because of exactly the same situation
> in the example above.
> So, please, how can I retrieve z?
> Many thanks,
> Jose
Presuming that your larger case is similar in structure to 'x.pop',
which is to say that each unique value is in sequential runs, you can
z <- do.call(data.frame, rle(x.pop))[, c(2, 1)]
colnames(z) <- c("x", "y")
> z
x y
1 1 1
2 2 2
3 3 3
4 1 4
See ?rle for more information on summarizing runs of values. The core
of the first step above yields:
> rle(x.pop)
Run Length Encoding
lengths: int [1:4] 1 2 3 4
values : num [1:4] 1 2 3 1
which is a list of two elements, that we coerce to a data frame using
do.call(), reversing the two columns to match your original order.
Marc Schwartz
More information about the R-help
mailing list