[R] Curve Fitting/Regression with Multiple Observations

Kyeong Soo (Joseph) Kim kyeongsoo.kim at gmail.com
Tue Apr 27 22:32:48 CEST 2010

Frankly speaking, I am not looking for such a framework.

The system I'm studying is a communication network (like M/M/1 queue,
but way too complicated to mathematically analyze it using classical
queueing theory) and the conclusion I want to make is qualitative
rather than quantatitive -- a high-level comparative study of various
network architectures based on the "equivalence principle" (a concept
specific to netwokring, not in the general sense).

What l want in this regard is a smooth, non-decreasing (hence
one-to-one) function built out of simulation data because later in my
processing, I need an inverse function of the said curve to find out
an x value given the y value. That was, in fact, the reason I used the
exponential (i.e., non-decreasing function) curve fiting.

Even though I don't need a statistical inference framework for my
work, I want to make sure that my use of regression/curve fitting
techniques with my simulation data (as a tool for getting the
mentioned curve) is proper and a usual practice among experts like

To get answer to my question, I digged a lot through the Internet but
found no clear explanation so far.

Your suggestions and providing examples (always!) are much
appreciated, but I am still not sure the use of those regression
procedures with the kind of data I described is a right way to do.

Again, many thanks for your prompt and kind answers,

On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> If you are looking for a framework for statistical inference you could
> look at additive models as in the mgcv package which has  a book
> associated with it if you need more info. e.g.
> library(mgcv)
> fm <- gam(dist ~ s(speed), data = cars)
> summary(fm)
> plot(dist ~ speed, cars, pch = 20)
> fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + c(fit))
> matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 2))
> On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
> <kyeongsoo.kim at gmail.com> wrote:
>> Hello Gabor,
>> Many thanks for providing actual examples for the problem!
>> In fact I know how to apply and generate plots using various R
>> functions including loess, lowess, and smooth.spline procedures.
>> My question, however, is whether applying those procedures directly on
>> the data with multiple observations/duplicate points(?) is on the
>> sound basis or not.
>> Before asking my question to the list, I checked smooth.spline manual
>> pages and found the mentioning of "cv" option related with duplicate
>> points, but I'm not sure "duplicate points" in the manual has the same
>> meaning as "multiple observations" in my case. To me, the manual seems
>> a bit unclear in this regard.
>> Looking at "car" data, I found it has multiple points with the same
>> "speed" but different "dist", which is exactly what I mean by multiple
>> observations, but am still not sure.
>> Regards,
>> Joseph
>> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> This will compute a loess curve and plot it:
>>> example(loess)
>>> plot(dist ~ speed, cars, pch = 20)
>>> lines(cars$speed, fitted(cars.lo))
>>> Also this directly plots it but does not give you the values of the
>>> curve separately:
>>> library(lattice)
>>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
>>> <kyeongsoo.kim at gmail.com> wrote:
>>>> I recently came to realize the true power of R for statistical
>>>> analysis -- mainly for post-processing of data from large-scale
>>>> simulations -- and have been converting many of existing Python(SciPy)
>>>> scripts to those based on R and/or Perl.
>>>> In the middle of this conversion, I revisited the problem of curve
>>>> fitting for simulation data with multiple observations resulting from
>>>> repetitions.
>>>> In the past, I first processed simulation data (i.e., multiple y's
>>>> from repetitions) to get a mean with a confidence interval for a given
>>>> value of x (independent variable) and then applied spline procedure
>>>> for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
>>>> 2, ...) to get a smoothed curve. Because of rather large confidence
>>>> intervals, however, the resulting curves were hardly smooth enough for
>>>> my purpose, I had to fix the function to exponential and used least
>>>> square methods to fit its parameters for data.
>>>> >From a plot with confidence intervals, it's rather easy for one to
>>>> visually and manually(?) figure out a smoothed curve for it.
>>>> So I'm thinking right now of directly applying spline (or whatever
>>>> regression procedures for this purpose) to the simulation data with
>>>> repetitions rather than means. The simulation data in this case looks
>>>> like this (assuming three repetitions):
>>>> # x    y
>>>> 1      1.2
>>>> 1      0.9
>>>> 1      1.3
>>>> 2      2.2
>>>> 2      1.7
>>>> 2      2.0
>>>> ...      ....
>>>> So my idea is to let spline procedure handle the fluctuations in the
>>>> data (i.e., in repetitions) by itself.
>>>> But I wonder whether this direct application of spline procedures for
>>>> data with multiple observations makes sense from the statistical
>>>> analysis (i.e., theoretical) point of view.
>>>> It may be a stupid question and quite obvious to many, but personally
>>>> I don't know where to start.
>>>> It would be greatly appreciated if anyone can shed a light on this in
>>>> this regard.
>>>> Many thanks in advance,
>>>> Joseph
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list