[R] unfold list (variable number of columns) into a data frame

Sun Oct 23 18:55:36 CEST 2011

Hi:

Here's one approach:

# Function to process a list component into a data frame
ff <- function(x) {
     data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
                runtime = as.numeric(x[4:length(x)]) )
   }

# Apply it to each element of the list:
do.call(rbind, lapply(data, ff))

or equivalently, using the plyr package,

library('plyr')
ldply(data, ff)

# Example:
L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
"67", "98"),
          c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
          c("1", "replication", "query", "2891", "2907", "2922", "2937"))
do.call(rbind, lapply(L, ff))
   time partitioning_mode workload runtime
1     1          sharding    query     607
2     1          sharding    query      85
3     1          sharding    query      52
4     1          sharding    query      79
5     1          sharding    query      77
6     1          sharding    query      67
7     1          sharding    query      98
8     1          sharding  refresh    2932
9     1          sharding  refresh    2870
10    1          sharding  refresh    2877
11    1          sharding  refresh    2868
12    1       replication    query    2891
13    1       replication    query    2907
14    1       replication    query    2922
15    1       replication    query    2937

HTH,
Dennis

On Sun, Oct 23, 2011 at 8:38 AM, Giovanni Azua <bravegag at gmail.com> wrote:
> Hello,
>
> I used R a lot one year ago and now I am a bit rusty :)
>
> I have my raw data which correspond to the list of runtimes per minute (minute "1" "2" "3" in two database modes "sharding" and "query" and two workload types "query" and "refresh") and as a list of char arrays that looks like this:
>
>> str(data)
> List of 122
>  $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
>  $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
>  $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
>  $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
>  $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
>  $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
>
> I would like to transform the one above into a data frame where this structure in unfolded in the following way:
>
> 'data.frame': N obs. of  3 variables:
>  $ time : int  1 1 1 1 1 1 1 1 1 1 1 ...
>  $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
>  $ workload : chr "query" "query" "query" "query" "query" "query" "query" "refresh" "refresh" "refresh" "refresh" ...
>  $ runtime : num  607 85 52 79 77 67 98 2932 2870 2877 2868...
>
> So instead of having an associative array (variable number of columns) it should become a simple list where the group or factors are repeated for every occurrence of the  specific runtime. Basically my ultimate goal is to get a data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" i.e. using this data frame format.
>
> Help greatly appreciated!
>
> TIA,
> Best regards,
> Giovanni
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>