[R] How to sum one column in a data frame keyed on other columns

Simon Blomberg blomsp at ozemail.com.au
Wed Dec 13 00:56:33 CET 2006


You could look at the reshape package, using sum as the aggregate function.

HTH,

Simon.

George Nachman wrote:
> I have a data frame that looks like this:
>
> url         time somethingirrelevant visits
> www.foo.com 1:00 xxx                 100
> www.foo.com 1:00 yyy                 50
> www.foo.com 2:00 xyz                 25
> www.bar.com 1:00 xxx                 200
> www.bar.com 1:00 zzz                 200
> www.foo.com 2:00 xxx                 500
>
> I'd like to write some code that takes this as input and outputs
> something like this:
>
> url         time total_vists
> www.foo.com 1:00 150
> www.foo.com 2:00 525
> www.bar.com 1:00 400
>
> In other words, I need to calculate the sum of visits for each unique
> tuple of (url,time).
>
> I can do it with this code, but it's very slow, and doesn't seem like
> the right approach:
>
> keys = list()
> getkey = function(m,cols,index) { paste(m[index,cols],collapse=",")  }
> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
> for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
> keys[[getkey(data,1:2,i)]] + data[i,4] }
>
> I'm sure there's a more functional-programming approach to this
> problem! Any ideas?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>   


-- 
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. 
Centre for Resource and Environmental Studies
The Australian National University              
Canberra ACT 0200                               
Australia                                       
T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
F: +61 2 6125 0757
CRICOS Provider # 00120C

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer 
can be extracted from a given body of data.
- John Tukey.



More information about the R-help mailing list