[R] avoiding too many loops - reshaping data

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Wed Nov 3 23:16:55 CET 2010


Here is the summary of methods. tapply is the fastest!

library(reshape)

system.time(for(i in 1:1000)cast(melt(mydf, measure.vars = "value"),
city ~ brand,fun.aggregate = sum))
  user  system elapsed

 18.40    0.00   18.44

library(reshape2)
system.time(for(i in 1:1000)dcast(mydf,city ~ brand, sum))
  user  system elapsed
 12.36    0.02   12.37


system.time(for(i in 1:1000)xtabs(value ~ city + brand, mydf))

 user  system elapsed

  2.45    0.00    2.47


system.time(for(i in 1:1000)tapply(mydf$value,mydf[c('city','brand')],sum))

  user  system elapsed

  0.78    0.00    0.79

Dimitri


On Wed, Nov 3, 2010 at 4:32 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> Try this:
>
>  xtabs(value ~ city + brand, mydf)
>
> On Wed, Nov 3, 2010 at 6:23 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>>
>> Hello!
>>
>> I have a data frame like this one:
>>
>>
>> mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"),
>>  brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"),
>>  value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116))
>> (mydf)
>>
>> What I need to get is a data frame like the one below - cities as
>> rows, brands as columns, and the sums of the "value" within each
>> city/brand combination in the body of the data frame:
>>
>> city x   y    z
>> a    3   23  336
>> b    7   42  231
>>
>>
>> I have written a code that involves multiple loops and subindexing -
>> but it's taking too long.
>> I am sure there must be a more efficient way of doing it.
>>
>> Thanks a lot for your hints!
>>
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list