[R] Curve Fitting/Regression with Multiple Observations

Fri Apr 30 05:25:48 CEST 2010

Dear Joseph,

If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything.

Sincerely,
KeithC.

-----Original Message-----
From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo.kim at gmail.com] 
Sent: Tuesday, April 27, 2010 2:33 PM
To: Gabor Grothendieck
Cc: r-help at r-project.org
Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

Frankly speaking, I am not looking for such a framework.

The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the "equivalence principle" (a concept specific to netwokring, not in the general sense).

What l want in this regard is a smooth, non-decreasing (hence
one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting.

Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you.

To get answer to my question, I digged a lot through the Internet but found no clear explanation so far.

Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do.

Again, many thanks for your prompt and kind answers, Joseph

On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> If you are looking for a framework for statistical inference you could 
> look at additive models as in the mgcv package which has  a book 
> associated with it if you need more info. e.g.
>
> library(mgcv)
> fm <- gam(dist ~ s(speed), data = cars)
> summary(fm)
> plot(dist ~ speed, cars, pch = 20)
> fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + 
> c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, 
> 2))
>
>
> On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim 
> <kyeongsoo.kim at gmail.com> wrote:
>> Hello Gabor,
>>
>> Many thanks for providing actual examples for the problem!
>>
>> In fact I know how to apply and generate plots using various R 
>> functions including loess, lowess, and smooth.spline procedures.
>>
>> My question, however, is whether applying those procedures directly 
>> on the data with multiple observations/duplicate points(?) is on the 
>> sound basis or not.
>>
>> Before asking my question to the list, I checked smooth.spline manual 
>> pages and found the mentioning of "cv" option related with duplicate 
>> points, but I'm not sure "duplicate points" in the manual has the 
>> same meaning as "multiple observations" in my case. To me, the manual 
>> seems a bit unclear in this regard.
>>
>> Looking at "car" data, I found it has multiple points with the same 
>> "speed" but different "dist", which is exactly what I mean by 
>> multiple observations, but am still not sure.
>>
>> Regards,
>> Joseph
>>
>>
>> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck 
>> <ggrothendieck at gmail.com> wrote:
>>> This will compute a loess curve and plot it:
>>>
>>> example(loess)
>>> plot(dist ~ speed, cars, pch = 20)
>>> lines(cars$speed, fitted(cars.lo))
>>>
>>> Also this directly plots it but does not give you the values of the 
>>> curve separately:
>>>
>>> library(lattice)
>>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>>>
>>>
>>>
>>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim 
>>> <kyeongsoo.kim at gmail.com> wrote:
>>>> I recently came to realize the true power of R for statistical 
>>>> analysis -- mainly for post-processing of data from large-scale 
>>>> simulations -- and have been converting many of existing 
>>>> Python(SciPy) scripts to those based on R and/or Perl.
>>>>
>>>> In the middle of this conversion, I revisited the problem of curve 
>>>> fitting for simulation data with multiple observations resulting 
>>>> from repetitions.
>>>>
>>>> In the past, I first processed simulation data (i.e., multiple y's 
>>>> from repetitions) to get a mean with a confidence interval for a 
>>>> given value of x (independent variable) and then applied spline 
>>>> procedure for those mean values only (i.e., unique pairs of (x_i, 
>>>> y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather 
>>>> large confidence intervals, however, the resulting curves were 
>>>> hardly smooth enough for my purpose, I had to fix the function to 
>>>> exponential and used least square methods to fit its parameters for data.
>>>>
>>>> >From a plot with confidence intervals, it's rather easy for one to
>>>> visually and manually(?) figure out a smoothed curve for it.
>>>> So I'm thinking right now of directly applying spline (or whatever 
>>>> regression procedures for this purpose) to the simulation data with 
>>>> repetitions rather than means. The simulation data in this case 
>>>> looks like this (assuming three repetitions):
>>>>
>>>> # x    y
>>>> 1      1.2
>>>> 1      0.9
>>>> 1      1.3
>>>> 2      2.2
>>>> 2      1.7
>>>> 2      2.0
>>>> ...      ....
>>>>
>>>> So my idea is to let spline procedure handle the fluctuations in 
>>>> the data (i.e., in repetitions) by itself.
>>>> But I wonder whether this direct application of spline procedures 
>>>> for data with multiple observations makes sense from the 
>>>> statistical analysis (i.e., theoretical) point of view.
>>>>
>>>> It may be a stupid question and quite obvious to many, but 
>>>> personally I don't know where to start.
>>>> It would be greatly appreciated if anyone can shed a light on this 
>>>> in this regard.
>>>>
>>>> Many thanks in advance,
>>>> Joseph
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>