[R] multi-condition summing puzzle
arun
smartpink111 at yahoo.com
Sat Jul 13 06:08:10 CEST 2013
Hi,
May be this helps:
dat1<- read.table(text="
ID county date company
1 x 1 comp1
2 y 1 comp3
3 y 2 comp1
4 y 3 comp1
5 x 2 comp2
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<- dat1
dat1$answer<-unsplit(lapply(split(dat1,dat1$county),function(x) do.call(rbind,lapply(seq_len(nrow(x)),function(i) {x1<-x[1:i,]; x2<-table(x1$company)/sum(table(x1$company));sum(x2^2)}))),dat1$county)
dat1
# ID county date company answer
#1 1 x 1 comp1 1.0000000
#2 2 y 1 comp3 1.0000000
#3 3 y 2 comp1 0.5000000
#4 4 y 3 comp1 0.5555556
#5 5 x 2 comp2 0.5000000
#or
dat2$answer<-with(dat2,unlist(ave(company,county,FUN=function(x) lapply(seq_along(x),function(i) {x1<-table(x[1:i]);sum((x1/sum(x1))^2)}))))
dat2
# ID county date company answer
#1 1 x 1 comp1 1.0000000
#2 2 y 1 comp3 1.0000000
#3 3 y 2 comp1 0.5000000
#4 4 y 3 comp1 0.5555556
#5 5 x 2 comp2 0.5000000
A.K.
Hi -
I have a seemingly complex data summarizing problem that I am having a hard time wrapping my mind around.
What I'm trying to do is sum the square of all company market
shares in a given county, UP TO that corresponding time. Sum of market
share is defined as: Number of company observations/ Total observations.
Here is example data and desired answer:
ID county date company answer
1 x 1 comp1 1
2 y 1 comp3 1
3 y 2 comp1 0.5
4 y 3 comp1 0.55556
5 x 2 comp2 0.5
For example, to get the answer for ID 4, we look at county y, dates 1, 2, 3 and sum: [(2/3)comp1]^2 +[(1/3)comp3]^2 = 0.55556
I've tried cumsum, but am simply stuck given all of the
different conditions. I have a large matrix of data for this with
several hundred companies, tens of counties and unique dates.
Any help would be extremely appreciated.
Thank you,
More information about the R-help
mailing list