[R] cov.wt gives different results from other (co)variance functions (cov, wtd.var)

Andrews, Chris chrisaa at med.umich.edu
Mon Mar 31 14:34:30 CEST 2014


I don't think cov.wt uses frequency weights.  However, I don't think this is mentioned in its help page.

Here is some information about the difference:
http://stats.stackexchange.com/questions/61225/correct-equation-for-weighted-unbiased-sample-covariance

The frequency version isn't hard to program (below), and is probably somewhere in R already.

Chris

mywtcov <- function(x, frqwt=rep(1,nrow(x)), unbiased=TRUE) {
	if (is.data.frame(x)) x <- as.matrix(x)
	n <- sum(frqwt)
	center <- colSums(frqwt * x) / n
	xcw <- sqrt(frqwt) * sweep(x, 2, center, check.margin = TRUE)
	cov <- crossprod(xcw)
	cov <- if (unbiased) {
		cov/(n - 1)
	} else {
		cov/n
	}
	list(cov=cov, center=center, unbiased=unbiased)
}

mywtcov(mydata)

mywtcov(mytable[,1:2,], mytable[,3])

all.equal(mywtcov(mytable[,1:2,], mytable[,3])$cov, xcov)



-----Original Message-----
From: Emilio Torres Manzanera [mailto:torres at uniovi.es] 
Sent: Sunday, March 30, 2014 6:31 PM
To: r-help at r-project.org
Subject: [R] cov.wt gives different results from other (co)variance functions (cov, wtd.var)

Dear  Sir,
I am not sure about the precision of the cov.wt function. It seems that it provides different results when using frequency weights. This discrepancy only occurs with the covariance matrix, not with the correlation matrix.
Do you know to how to solve this issue? Thank you
Best regards,
Emilio

rm(list=ls())
library(plyr)
library(Hmisc)

mydata <- iris[,1:2]
xcor <- cor(mydata)
xcov <- cov(mydata)
all.equal(cov.wt(mydata)$cov,xcov) # OK

## Now, we use frequency weights
mytable <- count(mydata) # Compute frequency table
all.equal(wtd.var(mytable[,1],weights=mytable$freq),  xcov[1,1]) # OK (Hmisc::wtd.var and cov)

# But with cov.wt
result <- cov.wt(mytable[,1:2],wt=mytable$freq,cor=TRUE)
all.equal(result$cov, xcov) # Wrong!
# "Mean relative difference: 0.003579418"
all.equal(wtd.var(mytable[,1],weights=mytable$freq),  result$cov[1,1]) # Wrong!
# "Mean relative difference: 0.003592277"
all.equal(result$cov[1,1], xcov[1,1]) # Wrong!
# "Mean relative difference: 0.003579418"

# The correlations are equal
all.equal(result$cor, xcor) # OK


sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Hmisc_3.14-0    Formula_1.1-1   survival_2.37-7 lattice_0.20-27
[5] plyr_1.8       

loaded via a namespace (and not attached):
[1] cluster_1.15.1      latticeExtra_0.6-26 RColorBrewer_1.0-5 

-- 
=================================================
Emilio Torres Manzanera
Fac. de Comercio - Universidad de Oviedo
c/ Luis Moya 261, E-33203 Gijón (Spain)
Tel. 985 182 197 email: torres at uniovi.es


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


More information about the R-help mailing list