[R] need help on computing double summation
Liaw, Andy
andy_liaw at merck.com
Thu Jun 16 14:22:53 CEST 2005
If I understood correctly, the following might be simpler (dat is the data
frame holding the data):
> sum(ave(dat$x, dat$id, FUN=scale, scale=FALSE) *
+ ave(dat$y, dat$id, FUN=scale, scale=FALSE))
[1] 6.229377
Andy
> From: Huntsinger, Reid
>
> You could do something like
>
> ids <- unique(mydata$id)
> ans <- vector(length=length(ids), mode="list")
> for (i in ids) {
> g <- which(mydata$id == i)
> ans[[i]] <- (length(g) - 1)*cov(mydata$x[g], mydata$y[g])
> }
> ans
>
> but cov() returns NA for length 1 vectors, so you'd want an
> if (length(g) ==
> 1) ans[i] <- 0 else ans[i] <- ... construction.
>
> This is almost brute force; you could also use tapply, as follows:
>
> sx <- tapply(mydata$x,INDEX=mydata$id,FUN=sum)
> sy <- tapply(mydata$y,INDEX=mydata$id,FUN=sum)
> sxy <- tapply(mydata$x*mydata$y, INDEX=mydata$id, FUN=sum)
> n <- tapply(mydata$id,INDEX=mydata$id,FUN=length) # or use table()!
>
> and now your inner sum is
>
> sxy - 2*sx*(sy/n) + n*(sx/n)*(sy/n) = sxy - sx*sy/n
>
> so
>
> sum(sxy - sx*sy/n) should do.
>
> One more approach is to make your dataset into a list of data
> frames, one
> for each id, then use lapply(). The list can be created by
> split(). In one
> line,
>
> lapply(split(mydata,f=mydata$id),function(z) (length(z$x) -
> 1)*cov(z$x,z$y))
>
> and take sum(,na.rm=TRUE) to remove the NAs due to single ids
> that you want
> to be zeros.
>
> Reid Huntsinger
>
>
>
>
> Reid Huntsinger
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kerry Bush
> Sent: Wednesday, June 15, 2005 11:41 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] need help on computing double summation
>
>
> Dear helpers in this forum,
>
> This is a clarified version of my previous
> questions in this forum. I really need your generous
> help on this issue.
>
> > Suppose I have the following data set:
> >
> >
> > ......
> >
>
> Now I want to compute the following double summation:
>
> sum_{i=1}^k
> sum_{j=1}^{n_i}(x_{ij}-mean(x_i))*(y_{ij}-mean(y_i))
>
> i is from 1 to k,
> indexing the ith subject id; and j is from 1 to n_i,
> indexing the jth observation for the ith subject.
>
> in the above expression, mean(x_i) is the mean of x
> values for the ith
> subject, mean(y_i) is the mean of y values for the ith
> subject.
>
> Is there a simple way to do this in R?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New Jersey, USA 08889), and/or its
> affiliates (which may be known outside the United States as
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for the use
> of the individual or entity named on this message. If you
> are not the intended recipient, and have received this
> message in error, please notify us immediately by reply
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>
>
More information about the R-help
mailing list