[R] Total Least Squares Regression
Lorenzo Isella
lorenzo.isella at gmail.com
Tue Mar 29 21:48:56 CEST 2016
Dear All,
I am looking for an R package to handle total least square regression
(TLS).
See http://bit.ly/1pSf4Bg
I am in a situation in which I have errors in both the dependent
variables X (plural because I have several predictors) and the
independent variable y.
I found several discussion inside and outside the R mailing list, e.g.
http://bit.ly/1pSfveQ
http://bit.ly/1RDAPuA (stackexchange)
In my understanding, when the variance of the error on all the
predictors and the dependent variable is identical, the problem
reduces to the orthogonal least squares and that is partially
addressed in the stackexchange link.
However, there are some bits missing
1) in the example given in stackexchange, how do I calculate the
confidence intervals of my prediction?
2) Suppose that the variances of the errors in the predictors and the
independent variables are not identical and I have somehow estimated
their ratio. How can I then generalize the example in the stackexchange link?
To fix the ideas, I paste some R code a the end of the email. Any
suggestion is welcome.
Many thanks
Lorenzo
#########################################################
library(dplyr)
tls <- function(X,y){
v <- prcomp(cbind(X,y))$rotation
beta <- -v[-ncol(v),ncol(v)] / v[ncol(v),ncol(v)]
return(beta)
}
tls_beta0 <- function(X,y, beta){
beta0 <- mean(y)-sum(colMeans(X)*beta)
return(beta0)
}
## some data to work upon
dd <- structure(list(mn = c(9226, 9404, 9604, 10183, 10788, 11352,
11984, 12921, 14057, 15235, 15560, 15738, 16039, 16729, 17332,
18398, 19458, 20001, 19861, 20690, 21495, 21869, 22145, 22521
), ind = c(43862, 40621, 37884, 36039, 35228, 34336, 33684, 33816,
33593, 33861, 33817, 33139, 32250, 31796, 31276, 30934, 31340,
32078, 31366, 30800, 31410, 31975, 32120, 32254), gov = c(1183695,
1281056, 1334560, 1390475, 1439175, 1472564, 1495661, 1520460,
1565451, 1604454, 1654996, 1672559, 1701664, 1722003, 1751547,
1793235, 1824640, 1874300, 1894248, 1939610, 2001224, 2056539,
2104642, 2156210), berd = c(26245.5, 26579, 25933, 25910, 26816.6,
27211, 28909.8, 30334.44, 33622.55, 35600, 36331.9, 36950, 38029,
38363, 38651.038, 41148, 43034, 46073, 45275, 46929, 51077.2,
53790.1, 53566.2, 56226)), row.names = c(NA, 24L), .Names = c("mn",
"ind", "gov", "berd"), class = "data.frame")
y <- dd$berd
X <- select(dd,mn, ind, gov)
beta_perp <- tls(X,y)
beta0_perp <- tls_beta0(X,y, beta_perp)
mymodel <- beta0_perp+beta_perp[1]*X[ ,1]+beta_perp[2]*X[ ,2]+
beta_perp[3]*X[ ,3]
## what is the confidence level of my model?
## what if I do not assume that the error variance is the same for all
## the regressors in X and for the independent variable in y?
More information about the R-help
mailing list