[R] strucchange Fstats() example

Achim Zeileis Achim.Zeileis at uibk.ac.at
Fri Jun 1 13:07:21 CEST 2012


On Fri, 1 Jun 2012, Mabille, Geraldine wrote:

> Thanks! I had seen this section but hadn't understood that the dashed 
> line represented the mean. Does that mean in my case (two tests hardly 
> significant P=0.03 and one NS) that I should reject the hypothesis of 
> different levels of survival probability for different groups of 
> females??

Hard to say with the information given. The evidence does not seem to be 
overwhelming but given that the tests are asymptotic in nature it might be 
better to employ a nonparametric test.

But this now goes beyond the R-help focus, I guess. If you contact me 
off-list, I can have a look if you're interested.

Best,
Z

> -----Original Message-----
> From: Achim Zeileis [mailto:Achim.Zeileis at uibk.ac.at]
> Sent: 1. juni 2012 12:49
> To: Mabille, Geraldine
> Cc: R-help at r-project.org
> Subject: RE: [R] strucchange Fstats() example
>
> On Fri, 1 Jun 2012, Mabille, Geraldine wrote:
>
>> Hi and thanks again for your  answer. I have just a last question regarding the choice of the functional...if you have time to help on that again.
>>
>> I have tried running the sctest using the functionals you recommended
>> sctest(gmass2,functional=meanL2BB)
>> sctest(gmass2,functional=rangeBB)
>> sctest(gmass2,functional=supLM(0.1))
>>
>> and I have a question regarding the meanL2BB test. The P-value
>> obtained for this test is P=0.18 and the plot obtained is given as an
>> attachment to the message. What I don't understand is why the overall
>> test for meanL2BB is not significant while we see on the graph at
>> least two peaks above the red line?
>
> This is discussed at the end of Section 5.1 (middle of page 3001) in the
> 2006 CSDA paper. For the meanL2BB functional the test statistic is the _mean_ of the process, not the _maximum_. Hence, the mean (visualized by the dashed horizontal line) needs to exceed the critical value (red horizontal line) to obtain a significant result.
>
> Best,
> Z
>
>> The other two tests (rangeBB and supLM) are both significant at
>> P=0.03.
>>
>> Thanks again for your precious help,
>> Geraldine
>>
>> -----Original Message-----
>> From: Achim Zeileis [mailto:Achim.Zeileis at uibk.ac.at]
>> Sent: 1. juni 2012 08:56
>> To: Mabille, Geraldine
>> Cc: R-help at r-project.org
>> Subject: RE: [R] strucchange Fstats() example
>>
>> On Thu, 31 May 2012, Mabille, Geraldine wrote:
>>
>>> Thanks a lot for your answer Achim, this helped a lot. I have done a
>>> lot of reading, following your recommendations and I think I have a
>>> better idea of what I should use. My dataset contains binary data on
>>> survival of the calf depending on the mass of the mother. We know
>>> that the probability of survival of the calf should vary according to
>>> the mass of the mother: 3 groups of mass expected, lower survival of
>>> the calves for small and large females, best survival of calf for
>>> intermediate-sized females. I want to identify at which masses those
>>> changes in survival occur.
>>>
>>> I think the code I need to use in order to test what I want is something of that type:
>>> gmass<-gefp(Success ~ Mass, family=binomial, fit=glm, order.by=~Mass)
>>>
>>> The first question I have is what is the difference between testing the gmass model shown up compared to the gmass2 model below?
>>> gmass2<-gefp(Success ~ 1, family=binomial, fit=glm, order.by=~Mass)
>>
>> gmass2 will be appropriate if under the null hypothesis there is a constant probability of survival and under the alternative there are different groups each of which has a constant probability of survival.
>>
>> gmass will be appropriate if under the null hypothesis the logit of the probability of survival increases/decreases linearly with mass - and under the alternative there are different groups each of which has a probability that increases/decreases with mass.
>>
>> From your description above I suspect that gmass2 is what you are looking for.
>>
>>> The second question, which is related I think to the first one is
>>> whether it makes sense to plot the gmass model as aggregate=FALSE,
>>> knowing that we have a single parameter in the model (Mass),
>>
>> In the gmass2 model, there is a single parameter (probability of survival) which may or may not change across mass.
>>
>> In the gmass model, there are two parameters (intercept and slope) which may or may not vary across mass.
>>
>>> and this parameter is also the parameter we use as the order.by=
>>> parameter plot(gmass, functional=meanL2BB, aggregate=FALSE) I think
>>> the whole point around questions 1 and 2 is that I don't understand
>>> the interpretation of the intercept in the gmass model???
>>>
>>> Third question: how to choose the proper functional?
>>
>> Always difficult because there are no strong optimality results. If
>> you test just a single parameter (i.e., gmass2), then I would expect
>> that most functionals should lead to similar performance. I would try
>>
>> plot(gmass2, functional = maxBB, aggregate = FALSE) plot(gmass2,
>> functional = meanL2BB) plot(gmass2, functional = supLM(0.1))
>>
>> where the latter two would probably have somewhat higher power. But the rangeBB might also work rather well here...
>>
>>> I have seen that you discuss that in your CSDA(2006 & 2003) papers
>>> and, in the 2006 paper you say: "in situations where there is a shift
>>> in the parameters and then a second shift back, it is advantageous to
>>> aggregate over time using the range and ....". Which means, if I
>>> understand well that rangeBB would be adapted to the kind of test I want to perform.
>>> However, since I want to determine the timing of the peaks, I need my
>>> functional to produce a "time series plot", for example like meanL2BB
>>> does. Do you think I can use meanL2BB functional in my case or should
>>> I compute an "home-made functional" which would use the range of efp
>>> but with comp applied first and time after (is this possible???).
>>
>> When you test only a single parameter, then you don't need to aggregate across the components anyway because there is just a single one.
>>
>>> Fourth question: is it OK to only make a visual estimation of the
>>> "breakpoints"  from the peaks seen on the graph after plotting the
>>> efp or should I use the breakpoints() function to properly date the
>>> breakpoints??? I'm not sure this breakpoints() function can be
>>> applied to binary data?
>>
>> In the "fxregime" package there is an unexported gbreakpoints() function that can be used. For example, if your data is in a data.frame d:
>>
>> gbp <- fxregime:::gbreakpoints(Success ~ 1, data = d,
>>   order.by = d$Mass, fit = glm, family = binomial, h = 20, ic = "BIC")
>> plot(gbp)
>> breakpoints(gbp)
>>
>> Note that the code is not optimized for glm and hence can be rather slow - extremely slow if you have many observations (more than a few hundred).
>> How many observations do you have?
>>
>>> Fifth question: I have noticed that the p-values I obtain after
>>> performing the sctest(gmass, functional=meanL2BB) for example are a
>>> bit different depending on if I introduce family=binomial as an
>>> argument in my gefp() call. Should I use this argument or is it used
>>> by default when you specify fit=glm???
>>
>> If you want to fit a binomial model yes. Otherwise a linear regression
>> is used. (The 'family' is simply passed on to glm.)
>>
>>> Last question, you said in your previous message that I could look at
>>> the maxstat_test() from package "coin" for an interesting
>>> nonparametric alternative but I think this package does not allow
>>> estimation of more than one breakpoint???
>>
>> True, it just uses a single breakpoint. You can apply it recursively, though, if you use the ctree() function from the "party" package and set the control parameters accordingly.
>>
>> Best,
>> Z
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list