[R] Estimating and predicting using "segmented" Package

Yunkai Huang yunkai at uvic.ca
Tue Aug 21 22:23:02 CEST 2012


Hello,
    Any one who can help me with the problems  regrarding segmented
regression.

1.When I run the code as following:

library(segmented)
n <-100
S<-200
x1 <- seq(-0.99,0.99,length=29)
MSEPtrm<-matrix(0,S,1)
MAEtrm<-matrix(0,S,1)
TRM<-matrix(0,S,length(x1))
for (s in seq(1:S)){
set.seed(123*s+123)
x<-seq(-1,1,length=n)
y0 <- 0.5*x+0.4*exp(-20*(x+0.25)^2)-0.4*exp(-20*(x-0.25)^2)
y1 <- 0.5*x1+0.4*exp(-20*(x1+0.25)^2)-0.4*exp(-20*(x1-0.25)^2)
epsilon <- rnorm(n, 0, 0.3)
epsilon1 <- rnorm(length(x1),0,0.3)
y <- y0 + epsilon
y1<- y1 + epsilon1
mydata<-data.frame(y,x)
mydata1<-data.frame(y=y1,x=x1)
##TRM  segmented regregression with 2 breakpoints
reg<-lm(y~x,mydata)
seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5))
mydata2<-data.frame(x=x1)
mydata2$dummy1<-pmax(mydata2$x-seg2$psi[1,2],0)
mydata2$dummy2<-I(mydata2$x>seg2$psi[1,2])*coef(seg2)[3]
mydata2$dummy3<-pmax(mydata2$x-seg2$psi[2,2],0)
mydata2$dummy4<-I(mydata2$x>seg2$psi[2,2])*coef(seg2)[4]
names(mydata2)[-1]<-names(model.frame(seg2))[-c(1,2)]
segpredict<-predict(seg2,mydata2)
segpe<-(segpredict-mydata1$y)^2
MSEPtrm<-mean((segpredict-mydata1$y)^2)
MAEtrm<-mean(abs(segpredict-mydata1$y))
TRM[s,]<-segpe
}

I found it works fine when s=1:15. However, when s=16, it returns message
“Error in diag(Cov[id, id]) : subscript out of bounds”
Actually it returns same message as well when some other numbers being
selected for s.


But if I run the line:
seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5))
twice individually. No error message returns anymore. I wonder why this
happens.


2. When I change x from uniformly spaced to uniform distribution

library(segmented)
n <-100
S<-200
x1 <- seq(-0.99,0.99,length=29)
MSEPtrm<-matrix(0,S,1)
MAEtrm<-matrix(0,S,1)
TRM<-matrix(0,S,length(x1))
for (s in seq(1:S)){
set.seed(123*s+123)
z<-runif(n,-1,1)
x<-z[order(z)]
y0 <- 0.5*x+0.4*exp(-20*(x+0.25)^2)-0.4*exp(-20*(x-0.25)^2)
y1 <- 0.5*x1+0.4*exp(-20*(x1+0.25)^2)-0.4*exp(-20*(x1-0.25)^2)
epsilon <- rnorm(n, 0, 0.3)
epsilon1 <- rnorm(length(x1),0,0.3)
y <- y0 + epsilon
y1<- y1 + epsilon1
mydata<-data.frame(y,x)
mydata1<-data.frame(y=y1,x=x1)
##TRM  segmented regregression with 2 breakpoints
reg<-lm(y~x,mydata)
seg2<-segmented(reg,seg.Z=~x,psi=c(-0.5,0.5))
mydata2<-data.frame(x=x1)
mydata2$dummy1<-pmax(mydata2$x-seg2$psi[1,2],0)
mydata2$dummy2<-I(mydata2$x>seg2$psi[1,2])*coef(seg2)[3]
mydata2$dummy3<-pmax(mydata2$x-seg2$psi[2,2],0)
mydata2$dummy4<-I(mydata2$x>seg2$psi[2,2])*coef(seg2)[4]
names(mydata2)[-1]<-names(model.frame(seg2))[-c(1,2)]
segpredict<-predict(seg2,mydata2)
segpe<-(segpredict-mydata1$y)^2
MSEPtrm<-mean((segpredict-mydata1$y)^2)
MAEtrm<-mean(abs(segpredict-mydata1$y))
TRM[s,]<-segpe
}

     Besides the message it may return as problem 1, it may return error
message
“Error in segmented.lm(reg, seg.Z = ~x, psi = c(-0.5, 0.5)) :
  only 1 datum in an interval: breakpoint(s) at the boundary or too close”
(ie, when s=9)

     I checked the data plot when it returned this message. I found the
data points are too few between the two change points. I understand
why it
returns this message technically. However, I do wonder whether the
segmented() function can be applied to the data set that x (independent
variable as well as threshold variable ) is randomly spaced and with few
points between break points. (Since randomly drawn data set cannot
guarantee enough points between change points). Since I want do some
simulation study, I need to apply this function to a lot of different data
sets. Regarding this problem, do you have any suggestions?
     Thanks a lot!


Victor, PhD Candidate
Department of Economics, University of Victoria
P.O. Box 1700, STN CSC, Victoria, B.C., CANADA,
V8W 2Y2
Email: yunkai at uvic.ca



More information about the R-help mailing list