[R] A question built on How to sum one column in a data frame keyed on other columns

Vincent Vinh-Hung anhxang at gmail.com
Sat Jul 4 12:21:45 CEST 2009


Dear List:

I have a question related to a previous discussion
How to sum one column in a data frame keyed on other columns
https://stat.ethz.ch/pipermail/r-help/2006-December/122141.html
(George Nachman, Bill Venables)

The original query was to calculate the sum of visits for each unique
tuple of (url, time) from the data frame dat:

> dat
          url time somethingirrelevant visits
1 www.foo.com 1:00                 xxx    100
2 www.foo.com 1:00                 yyy     50
3 www.foo.com 2:00                 xyz     25
4 www.bar.com 1:00                 xxx    200
5 www.bar.com 1:00                 zzz    200
6 www.foo.com 2:00                 xxx    500

The response gave:

> tdat
          url time total_visits
4 www.bar.com 1:00          400
1 www.foo.com 1:00          150
3 www.foo.com 2:00          525

My question is how can I build a similar data frame but having also rows for
combinations of (url, time) that were not in dat?
In this example, the (url, time) without record of visit would be
(www.bar.com, 2:00):

> ndat
          url time total_visits
www.bar.com 1:00          400
www.bar.com 2:00            0
www.foo.com 1:00          150
www.foo.com 2:00          525

I have tried to build a data frame with
> adat <- data.frame (url = rep(unique(dat$url), each=length(unique(dat$time))),
	time=unique(dat$time), alt_visits=0)
> ndat <- merge (adat, tdat, by=c("url", "time"), all = TRUE)
then replace the NA and remove the alt_visits column.
But this appears clumsy and quite slow with 10000 rows and 4 columns,
I would be most grateful for other suggestions,

Vincent




More information about the R-help mailing list