R: [R] slope estimations of teeth like data

Gabor Grothendieck ggrothendieck at myway.com
Tue Jun 15 23:33:20 CEST 2004



I am not entirely sure what is essential and what is inessential in 
this problem.  In the yy data shown, the peaks occur in one step and
all the peaks are equal.  Just plotting

plot(diff(yy))

shows that there is a huge region of potential cutoffs for separating
out the large single step increases from the other differences.

If it continues to be true that the increases are large relative 
to everything else but the cutoff is not quite so obvious as 
here, we can calculate a cutoff by just sorting the unique
differences and partitioning them in every possible
way into small and large differences and then choosing the 
partition whose residual variance is least.   

Its actually pretty simple to do that from first pinciples, as
shown here, but there might be a function in R that does it
even more directly.

diffyy <- diff(yy)
yy. <- unique(sort(diffyy)) # unique diffs sorted
yy. <- yy.[-1] - diff(yy.)/2  # midpoints

# For each 2 class partition into small & large, get residual var
res <- sapply(yy., function(x) 
                     var(resid(lm(diffyy ~ factor(diffyy < x)))) )

cutoff <- yy.[which.min(res)]

# indices of points involved in rise
which(diffyy > cutoff) + 1

---

Petr Pikal <petr.pikal at precheza.cz>

 
On 15 Jun 2004 at 13:52, Vito Muggeo wrote: 

> Dear Petr, 
> Probably I don't understand exactly what you are looking for. 
> 
> However your "plot(x,c(y,z))" suggests a broken-line model for the 
> response "c(y,x)" versus the variables x. Therefore you could estimate 
> a segmented model to obtain (different) slope (and breakpoint) 
> estimates. See the package segmented. 

Thank you Vito, but it is not what I want. plot(x,c(y,z)) shows only one "spike" 
and I have many such spikes in actual data. 

My actual data look like those 

set.seed(1) 
y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
yy <- NULL 
for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
y.l <- length(yy) 
plot(1:y.l, yy) 

x axis is actually a time and y is a weight of gradually filled conteiner, which is 
irregularly emptied. I want to do an hourly and/or daily averages of increases in 
weight (it can by done by aggregate) 

myfac <- gl(y.l/12,12,length=1271) #hopefully length is ok 

y.agg <- aggregate(diff(yy), list(myfac), mean) 
## there will be list(hod=cut(time.axis,"hour")) construction actually 

0.03 can be expected average result and some aggregated values ar OK but some 
are wrong as they include values from emptying time. 

*** This*** is probably what I need, I need to set some logical vector which will 
be TRUE when there was a filling time and FALSE during other times. And I 
need to specify it according a data I have available. 

Best what I was able to do was to consider filling time as a time when let say 

diff(yy) >= 0 

was between prespecified limits, but you know how it is with real life and 
prespecified limits. 

Or I can plot my data against time, manually find out regions which are correct 
and make a aggregation only with correct data. But there are 24*60*3 values 
each day so I prefer not to do it manually. 

Or finally I can throw away any hourly average which is not in set limits, but I 
prefer to throw away as little data as possible. 

I hope I was able to clarify the issue a bit. 

Thank you 
Best regards 
Petr 


> 
> best, 
> vito 
> 
> 
> 
> ----- Original Message ----- 
> From: Petr Pikal <petr.pikal at precheza.cz> 
> To: <r-help at stat.math.ethz.ch> 
> Sent: Tuesday, June 15, 2004 1:11 PM 
> Subject: [R] slope estimations of teeth like data 
> 
> 
> > Dear all 
> > 
> > Suppose I have teeth like data similar like 
> > 
> > x <- 1:200 
> > y <- 0.03*x[1:100]+rnorm(100, mean=.001, sd=.03) 
> > z <- 3-rep(seq(1,100,10),each=10)*.03+rnorm(100,mean=.001, sd=.03) 
> > plot(x,c(y,z)) 
> > 
> > and I want to have a gradient estimations for some values from 
> > increasing 
> part of 
> > data 
> > 
> > like 
> > 
> > y.agg <- aggregate(diff(c(y,z)), 
> > list(rep(seq(1,200,10),each=10)[1:199]), 
> mean) 
> > 
> > y.agg[1:10,] ##is OK, I want that 
> > y.agg[11:20,] ##is not OK, I do not want that 
> > 
> > actual data are similar but more irregular and have subsequent 
> > gradual 
> increases 
> > and decreases, more like 
> > 
> > set.seed(1) 
> > yy<-NULL 
> > for( i in 1:10) yy <- c(yy,c(y,z)[1:floor(runif(1)*200)]) 
> > length(yy) 
> > [1] 1098 
> > 
> > plot(1:1098,yy) 
> > 
> > Is there anybody who has some experience with such data, mainly how 
> > to 
> extract 
> > only increasing portions or to filter values of "yy" such as only 
> aggregated slopes 
> > from increasing parts are computed and other parts are set to NA. 
> Sometimes 
> > actual data have so long parts of steady or even slightly increasing 
> values at 
> > decreasing part that aggregated values are slightly positive 
> > although they 
> are 
> > actually from decreasing portion of data. 
> > 
> > Thank you 
> > Petr Pikal 
> > petr.pikal at precheza.cz 
> >




More information about the R-help mailing list