# [R] a question about data manipulation in R

John Kane jrkrideau at inbox.com
Wed Sep 16 03:20:16 CEST 2015

```Refugees are welcome. Just register at the desk over there. :)

Thanks, I have been drawing a complete blank without attacking it by brute force and advanced stupidity.

John Kane

> -----Original Message-----
> From: john.posner at mjbiostat.com
> Sent: Tue, 15 Sep 2015 20:59:59 +0000
> To: zkarimi1985 at yahoo.com
> Subject: Re: [R] a question about data manipulation in R
>
> Given your "input: data frame, with variables "V1" and "V2", here's a
> solution. This might not be the most "R-like" solution, since I'm still
> more of a Python refugee than a native R coder.
>
> -John
>
>
> # analyze input, using run-length encoding
> runs_table = rle(input\$V1)
> number_of_runs = length(runs_table\$values)  # number of columns in answer
> matrix
> lengths_of_runs = runs_table\$lengths
> max_run = max(lengths_of_runs)              # number of rows in answer
> matrix
>
> # set up answer matrix, with all NA values
> answer = matrix(rep(NA, number_of_runs * max_run),
>                            nrow=max_run, ncol=number_of_runs)
>
> # find the locations in the input\$V1 column where a new value begins
> indexes = c(0, cumsum(lengths_of_runs)) + 1
>
> # column-by-column: copy values from input\$V2 to the answer matrix,
> overwriting NA values
> for (col in 1:number_of_runs) {
> (indexes[col]):(indexes[col+1]-1) , 2]
> }
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help