[R] Curve Fitting/Regression with Multiple Observations

Tue Apr 27 21:07:35 CEST 2010

Hello Gabor,

Many thanks for providing actual examples for the problem!

In fact I know how to apply and generate plots using various R
functions including loess, lowess, and smooth.spline procedures.

My question, however, is whether applying those procedures directly on
the data with multiple observations/duplicate points(?) is on the
sound basis or not.

Before asking my question to the list, I checked smooth.spline manual
pages and found the mentioning of "cv" option related with duplicate
points, but I'm not sure "duplicate points" in the manual has the same
meaning as "multiple observations" in my case. To me, the manual seems
a bit unclear in this regard.

Looking at "car" data, I found it has multiple points with the same
"speed" but different "dist", which is exactly what I mean by multiple
observations, but am still not sure.

Regards,
Joseph

On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> This will compute a loess curve and plot it:
>
> example(loess)
> plot(dist ~ speed, cars, pch = 20)
> lines(cars$speed, fitted(cars.lo))
>
> Also this directly plots it but does not give you the values of the
> curve separately:
>
> library(lattice)
> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>
>
>
> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
> <kyeongsoo.kim at gmail.com> wrote:
>> I recently came to realize the true power of R for statistical
>> analysis -- mainly for post-processing of data from large-scale
>> simulations -- and have been converting many of existing Python(SciPy)
>> scripts to those based on R and/or Perl.
>>
>> In the middle of this conversion, I revisited the problem of curve
>> fitting for simulation data with multiple observations resulting from
>> repetitions.
>>
>> In the past, I first processed simulation data (i.e., multiple y's
>> from repetitions) to get a mean with a confidence interval for a given
>> value of x (independent variable) and then applied spline procedure
>> for those mean values only (i.e., unique pairs of (x_i, y_i) for i=1,
>> 2, ...) to get a smoothed curve. Because of rather large confidence
>> intervals, however, the resulting curves were hardly smooth enough for
>> my purpose, I had to fix the function to exponential and used least
>> square methods to fit its parameters for data.
>>
>> >From a plot with confidence intervals, it's rather easy for one to
>> visually and manually(?) figure out a smoothed curve for it.
>> So I'm thinking right now of directly applying spline (or whatever
>> regression procedures for this purpose) to the simulation data with
>> repetitions rather than means. The simulation data in this case looks
>> like this (assuming three repetitions):
>>
>> # x    y
>> 1      1.2
>> 1      0.9
>> 1      1.3
>> 2      2.2
>> 2      1.7
>> 2      2.0
>> ...      ....
>>
>> So my idea is to let spline procedure handle the fluctuations in the
>> data (i.e., in repetitions) by itself.
>> But I wonder whether this direct application of spline procedures for
>> data with multiple observations makes sense from the statistical
>> analysis (i.e., theoretical) point of view.
>>
>> It may be a stupid question and quite obvious to many, but personally
>> I don't know where to start.
>> It would be greatly appreciated if anyone can shed a light on this in
>> this regard.
>>
>> Many thanks in advance,
>> Joseph
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>