[R] unfold list (variable number of columns) into a data frame
Dennis Murphy
djmuser at gmail.com
Sun Oct 23 18:55:36 CEST 2011
Hi:
Here's one approach:
# Function to process a list component into a data frame
ff <- function(x) {
data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
runtime = as.numeric(x[4:length(x)]) )
}
# Apply it to each element of the list:
do.call(rbind, lapply(data, ff))
or equivalently, using the plyr package,
library('plyr')
ldply(data, ff)
# Example:
L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
"67", "98"),
c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
c("1", "replication", "query", "2891", "2907", "2922", "2937"))
do.call(rbind, lapply(L, ff))
time partitioning_mode workload runtime
1 1 sharding query 607
2 1 sharding query 85
3 1 sharding query 52
4 1 sharding query 79
5 1 sharding query 77
6 1 sharding query 67
7 1 sharding query 98
8 1 sharding refresh 2932
9 1 sharding refresh 2870
10 1 sharding refresh 2877
11 1 sharding refresh 2868
12 1 replication query 2891
13 1 replication query 2907
14 1 replication query 2922
15 1 replication query 2937
HTH,
Dennis
On Sun, Oct 23, 2011 at 8:38 AM, Giovanni Azua <bravegag at gmail.com> wrote:
> Hello,
>
> I used R a lot one year ago and now I am a bit rusty :)
>
> I have my raw data which correspond to the list of runtimes per minute (minute "1" "2" "3" in two database modes "sharding" and "query" and two workload types "query" and "refresh") and as a list of char arrays that looks like this:
>
>> str(data)
> List of 122
> $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ...
> $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
> $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ...
> $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
> $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98" ...
> $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
>
> I would like to transform the one above into a data frame where this structure in unfolded in the following way:
>
> 'data.frame': N obs. of 3 variables:
> $ time : int 1 1 1 1 1 1 1 1 1 1 1 ...
> $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
> $ workload : chr "query" "query" "query" "query" "query" "query" "query" "refresh" "refresh" "refresh" "refresh" ...
> $ runtime : num 607 85 52 79 77 67 98 2932 2870 2877 2868...
>
> So instead of having an associative array (variable number of columns) it should become a simple list where the group or factors are repeated for every occurrence of the specific runtime. Basically my ultimate goal is to get a data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" i.e. using this data frame format.
>
> Help greatly appreciated!
>
> TIA,
> Best regards,
> Giovanni
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list