[R] NA's in segmented
vito muggeo
vmuggeo at dssm.unipa.it
Mon Oct 6 14:05:41 CEST 2008
Dear Tyler,
Yes the problem is with NA..
There are two solutions:
1) You can use lm() + segmented (you fit a gaussian model, so why do you
use glm()?)
2)If you want to use glm()+ segmented(), use na.omit() to pass your
dataframe to the data argument of glm, glm(.., data=na.omit())
Also, if you want to constrain the right slope to be zero use "the minus
variable" (see the relevant recent paper on Rnews)
hope this helps you
vito
###############################################
Example
x<-1:50/50
y<- -2*x+2*pmax(x-.6,0)+rnorm(50)*.1
x[20:22]<-NA
d<-data.frame(xx=x,yy=y)
rm(x,y)
#Use lm() - It works:
o<-lm(yy~xx, data=d, na.action=na.omit)
os<-segmented(o,seg.Z=~xx,psi=.5)
#Use glm() - It works:
o<-glm(yy~xx, data=na.omit(d))
os<-segmented(o,seg.Z=~xx,psi=.5)
#constrain the right slope to zero
d$xxx<- -d$x
o<-glm(yy~1, data=na.omit(d))
os1<-segmented(o,seg.Z=~xxx,psi=-.5)
with(d,plot(xx,yy)
plot(os, add=TRUE)
plot(os1, add=TRUE, col=2, rev.sgn=TRUE)
T.D.Rudolph ha scritto:
> I am trying to fit a very simple broken stick model using the package
> "segmented" but I have hit a roadblock.
>
>> str(data)
> 'data.frame': 18 obs. of 2 variables:
> $ Bin : num 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 ...
> $ LnFREQ: num 5.06 4.23 3.50 3.47 2.83 ...
>
> I fit the lm easily:
>> fit.lm<-lm(LnFREQ~Bin, data=id07)
>
> But I keep getting an error message:
>> fit.seg<-segmented(fit.glm, seg.Z=~Bin, psi=6)
> Error in cbind(XREG, U, V) :
> number of rows of matrices must match (see arg 2)
>
> I think the problem is due to NA's in the Bin data, but there doesn't seem
> to be an "na.action" for segmented (). What is the best way to get around
> this problem? I would like to keep all Bin values in the output for
> continuity. Data below....
>
> Tyler
>
>> data
> Bin LnFREQ
> 1 0.25 5.0562458
> 2 0.75 4.2341065
> 3 1.25 3.4965076
> 4 1.75 3.4657359
> 5 2.25 2.8332133
> 6 2.75 2.9444390
> 7 3.25 2.4849066
> 8 3.75 1.3862944
> 9 4.25 1.7917595
> 10 4.75 1.3862944
> 11 5.25 0.6931472
> 12 5.75 1.0986123
> 13 6.25 0.0000000
> 14 6.75 0.0000000
> 15 7.25 NA
> 16 7.75 0.0000000
> 17 8.25 0.0000000
> 18 8.75 NA
>
>> summary(fit.glm)
> Call:
> glm(formula = LnFREQ ~ Bin, data = id07, na.action = na.omit)
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -0.74139 -0.22999 0.01065 0.21245 0.72684
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 4.50646 0.21088 21.37 4.37e-12 ***
> Bin -0.63434 0.04467 -14.20 1.05e-09 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> (Dispersion parameter for gaussian family taken to be 0.1844898)
> Null deviance: 39.7785 on 15 degrees of freedom
> Residual deviance: 2.5829 on 14 degrees of freedom
> (2 observations deleted due to missingness)
> AIC: 22.227
> Number of Fisher Scoring iterations: 2
--
====================================
Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 6626240
fax: 091 485726/485612
http://dssm.unipa.it/vmuggeo
More information about the R-help
mailing list