[R] unfold list (variable number of columns) into a data frame

Giovanni Azua bravegag at gmail.com
Sun Oct 23 17:38:38 CEST 2011


Hello,

I used R a lot one year ago and now I am a bit rusty :)

I have my raw data which correspond to the list of runtimes per minute (minute "1" "2" "3" in two database modes "sharding" and "query" and two workload types "query" and "refresh") and as a list of char arrays that looks like this:

> str(data)
List of 122
 $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
 $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
 $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" "79" ...
 
I would like to transform the one above into a data frame where this structure in unfolded in the following way:

'data.frame': N obs. of  3 variables:
 $ time : int  1 1 1 1 1 1 1 1 1 1 1 ...
 $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
 $ workload : chr "query" "query" "query" "query" "query" "query" "query" "refresh" "refresh" "refresh" "refresh" ...
 $ runtime : num  607 85 52 79 77 67 98 2932 2870 2877 2868...

So instead of having an associative array (variable number of columns) it should become a simple list where the group or factors are repeated for every occurrence of the  specific runtime. Basically my ultimate goal is to get a data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" i.e. using this data frame format.

Help greatly appreciated!

TIA,
Best regards,
Giovanni


More information about the R-help mailing list