[R] a question about data manipulation in R
John Posner
john.posner at MJBIOSTAT.COM
Tue Sep 15 22:59:59 CEST 2015
Given your "input: data frame, with variables "V1" and "V2", here's a solution. This might not be the most "R-like" solution, since I'm still more of a Python refugee than a native R coder.
-John
# analyze input, using run-length encoding
runs_table = rle(input$V1)
number_of_runs = length(runs_table$values) # number of columns in answer matrix
lengths_of_runs = runs_table$lengths
max_run = max(lengths_of_runs) # number of rows in answer matrix
# set up answer matrix, with all NA values
answer = matrix(rep(NA, number_of_runs * max_run),
nrow=max_run, ncol=number_of_runs)
# find the locations in the input$V1 column where a new value begins
indexes = c(0, cumsum(lengths_of_runs)) + 1
# column-by-column: copy values from input$V2 to the answer matrix, overwriting NA values
for (col in 1:number_of_runs) {
answer[1:lengths_of_runs[col], col] = input[ (indexes[col]):(indexes[col+1]-1) , 2]
}
More information about the R-help
mailing list