[R] A question built on How to sum one column in a data frame keyed on other columns
Vincent Vinh-Hung
anhxang at gmail.com
Sat Jul 4 12:21:45 CEST 2009
Dear List:
I have a question related to a previous discussion
How to sum one column in a data frame keyed on other columns
https://stat.ethz.ch/pipermail/r-help/2006-December/122141.html
(George Nachman, Bill Venables)
The original query was to calculate the sum of visits for each unique
tuple of (url, time) from the data frame dat:
> dat
url time somethingirrelevant visits
1 www.foo.com 1:00 xxx 100
2 www.foo.com 1:00 yyy 50
3 www.foo.com 2:00 xyz 25
4 www.bar.com 1:00 xxx 200
5 www.bar.com 1:00 zzz 200
6 www.foo.com 2:00 xxx 500
The response gave:
> tdat
url time total_visits
4 www.bar.com 1:00 400
1 www.foo.com 1:00 150
3 www.foo.com 2:00 525
My question is how can I build a similar data frame but having also rows for
combinations of (url, time) that were not in dat?
In this example, the (url, time) without record of visit would be
(www.bar.com, 2:00):
> ndat
url time total_visits
www.bar.com 1:00 400
www.bar.com 2:00 0
www.foo.com 1:00 150
www.foo.com 2:00 525
I have tried to build a data frame with
> adat <- data.frame (url = rep(unique(dat$url), each=length(unique(dat$time))),
time=unique(dat$time), alt_visits=0)
> ndat <- merge (adat, tdat, by=c("url", "time"), all = TRUE)
then replace the NA and remove the alt_visits column.
But this appears clumsy and quite slow with 10000 rows and 4 columns,
I would be most grateful for other suggestions,
Vincent
More information about the R-help
mailing list