# [R] Curve Fitting/Regression with Multiple Observations

Gabor Grothendieck ggrothendieck at gmail.com
Tue Apr 27 21:46:03 CEST 2010

```If you are looking for a framework for statistical inference you could
look at additive models as in the mgcv package which has  a book

library(mgcv)
fm <- gam(dist ~ s(speed), data = cars)
summary(fm)
plot(dist ~ speed, cars, pch = 20)
fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit))
matlines(cars\$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2))

On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
<kyeongsoo.kim at gmail.com> wrote:
> Hello Gabor,
>
> Many thanks for providing actual examples for the problem!
>
> In fact I know how to apply and generate plots using various R
> functions including loess, lowess, and smooth.spline procedures.
>
> My question, however, is whether applying those procedures directly on
> the data with multiple observations/duplicate points(?) is on the
> sound basis or not.
>
> Before asking my question to the list, I checked smooth.spline manual
> pages and found the mentioning of "cv" option related with duplicate
> points, but I'm not sure "duplicate points" in the manual has the same
> meaning as "multiple observations" in my case. To me, the manual seems
> a bit unclear in this regard.
>
> Looking at "car" data, I found it has multiple points with the same
> "speed" but different "dist", which is exactly what I mean by multiple
> observations, but am still not sure.
>
> Regards,
> Joseph
>
>
> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> This will compute a loess curve and plot it:
>>
>> example(loess)
>> plot(dist ~ speed, cars, pch = 20)
>> lines(cars\$speed, fitted(cars.lo))
>>
>> Also this directly plots it but does not give you the values of the
>> curve separately:
>>
>> library(lattice)
>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>>
>>
>>
>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
>> <kyeongsoo.kim at gmail.com> wrote:
>>> I recently came to realize the true power of R for statistical
>>> analysis -- mainly for post-processing of data from large-scale
>>> simulations -- and have been converting many of existing Python(SciPy)
>>> scripts to those based on R and/or Perl.
>>>
>>> In the middle of this conversion, I revisited the problem of curve
>>> fitting for simulation data with multiple observations resulting from
>>> repetitions.
>>>
>>> In the past, I first processed simulation data (i.e., multiple y's
>>> from repetitions) to get a mean with a confidence interval for a given
>>> value of x (independent variable) and then applied spline procedure
>>> for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
>>> 2, ...) to get a smoothed curve. Because of rather large confidence
>>> intervals, however, the resulting curves were hardly smooth enough for
>>> my purpose, I had to fix the function to exponential and used least
>>> square methods to fit its parameters for data.
>>>
>>> >From a plot with confidence intervals, it's rather easy for one to
>>> visually and manually(?) figure out a smoothed curve for it.
>>> So I'm thinking right now of directly applying spline (or whatever
>>> regression procedures for this purpose) to the simulation data with
>>> repetitions rather than means. The simulation data in this case looks
>>> like this (assuming three repetitions):
>>>
>>> # x    y
>>> 1      1.2
>>> 1      0.9
>>> 1      1.3
>>> 2      2.2
>>> 2      1.7
>>> 2      2.0
>>> ...      ....
>>>
>>> So my idea is to let spline procedure handle the fluctuations in the
>>> data (i.e., in repetitions) by itself.
>>> But I wonder whether this direct application of spline procedures for
>>> data with multiple observations makes sense from the statistical
>>> analysis (i.e., theoretical) point of view.
>>>
>>> It may be a stupid question and quite obvious to many, but personally
>>> I don't know where to start.
>>> It would be greatly appreciated if anyone can shed a light on this in
>>> this regard.
>>>
>>> Joseph
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help