[R] cov.wt gives different results from other (co)variance functions (cov, wtd.var)
Andrews, Chris
chrisaa at med.umich.edu
Mon Mar 31 14:34:30 CEST 2014
I don't think cov.wt uses frequency weights. However, I don't think this is mentioned in its help page.
Here is some information about the difference:
http://stats.stackexchange.com/questions/61225/correct-equation-for-weighted-unbiased-sample-covariance
The frequency version isn't hard to program (below), and is probably somewhere in R already.
Chris
mywtcov <- function(x, frqwt=rep(1,nrow(x)), unbiased=TRUE) {
if (is.data.frame(x)) x <- as.matrix(x)
n <- sum(frqwt)
center <- colSums(frqwt * x) / n
xcw <- sqrt(frqwt) * sweep(x, 2, center, check.margin = TRUE)
cov <- crossprod(xcw)
cov <- if (unbiased) {
cov/(n - 1)
} else {
cov/n
}
list(cov=cov, center=center, unbiased=unbiased)
}
mywtcov(mydata)
mywtcov(mytable[,1:2,], mytable[,3])
all.equal(mywtcov(mytable[,1:2,], mytable[,3])$cov, xcov)
-----Original Message-----
From: Emilio Torres Manzanera [mailto:torres at uniovi.es]
Sent: Sunday, March 30, 2014 6:31 PM
To: r-help at r-project.org
Subject: [R] cov.wt gives different results from other (co)variance functions (cov, wtd.var)
Dear Sir,
I am not sure about the precision of the cov.wt function. It seems that it provides different results when using frequency weights. This discrepancy only occurs with the covariance matrix, not with the correlation matrix.
Do you know to how to solve this issue? Thank you
Best regards,
Emilio
rm(list=ls())
library(plyr)
library(Hmisc)
mydata <- iris[,1:2]
xcor <- cor(mydata)
xcov <- cov(mydata)
all.equal(cov.wt(mydata)$cov,xcov) # OK
## Now, we use frequency weights
mytable <- count(mydata) # Compute frequency table
all.equal(wtd.var(mytable[,1],weights=mytable$freq), xcov[1,1]) # OK (Hmisc::wtd.var and cov)
# But with cov.wt
result <- cov.wt(mytable[,1:2],wt=mytable$freq,cor=TRUE)
all.equal(result$cov, xcov) # Wrong!
# "Mean relative difference: 0.003579418"
all.equal(wtd.var(mytable[,1],weights=mytable$freq), result$cov[1,1]) # Wrong!
# "Mean relative difference: 0.003592277"
all.equal(result$cov[1,1], xcov[1,1]) # Wrong!
# "Mean relative difference: 0.003579418"
# The correlations are equal
all.equal(result$cor, xcor) # OK
sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] Hmisc_3.14-0 Formula_1.1-1 survival_2.37-7 lattice_0.20-27
[5] plyr_1.8
loaded via a namespace (and not attached):
[1] cluster_1.15.1 latticeExtra_0.6-26 RColorBrewer_1.0-5
--
=================================================
Emilio Torres Manzanera
Fac. de Comercio - Universidad de Oviedo
c/ Luis Moya 261, E-33203 Gijón (Spain)
Tel. 985 182 197 email: torres at uniovi.es
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the R-help
mailing list