[R] a question about data manipulation in R

John Posner john.posner at MJBIOSTAT.COM
Tue Sep 15 22:59:59 CEST 2015

Given your "input: data frame, with variables "V1" and "V2", here's a solution. This might not be the most "R-like" solution, since I'm still more of a Python refugee than a native R coder.


# analyze input, using run-length encoding
runs_table = rle(input$V1)
number_of_runs = length(runs_table$values)  # number of columns in answer matrix
lengths_of_runs = runs_table$lengths
max_run = max(lengths_of_runs)              # number of rows in answer matrix

# set up answer matrix, with all NA values
answer = matrix(rep(NA, number_of_runs * max_run),
                           nrow=max_run, ncol=number_of_runs)

# find the locations in the input$V1 column where a new value begins
indexes = c(0, cumsum(lengths_of_runs)) + 1

# column-by-column: copy values from input$V2 to the answer matrix, overwriting NA values
for (col in 1:number_of_runs) {
  answer[1:lengths_of_runs[col], col] = input[ (indexes[col]):(indexes[col+1]-1) , 2]

More information about the R-help mailing list