[R] LM with summation function

R. Michael Weylandt michael.weylandt at gmail.com
Tue May 22 18:30:47 CEST 2012


But if I understand your problem correctly, you can get the y values
from the s values. I'm relying on your statement that "s is sum of the
current y and all previous y (s3 = y1 + y2 + y3)." E.g.,

y <- c(1, 4, 6, 9, 3, 7)

s1 = 1
s2 = 4 + s1 = 5
s3 = 6 + s2 = 11

more generally

s <- cumsum(y)

Then if we only see s, we can get back the y vector by doing

c(s[1], diff(s))

which is identical to y.

So for your data, the underlying y must have been c(109, 1091, 4125,
2891) right?

Or have I completely misunderstood your problem?

Michael

On Tue, May 22, 2012 at 12:25 PM, Robbie Edwards
<robbie.edwards at gmail.com> wrote:
> Actually, I can't.  I don't know the y values.  Only the s and only for a
> subset of the data.
>
> Like this.
>
> d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
>
>
>
> On Tue, May 22, 2012 at 11:57 AM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> You can reconstruct the y values by taking first-differences of the s
>> vector, no? Then it sounds like you're good to go
>>
>> Best, Michael
>>
>> On Tue, May 22, 2012 at 11:40 AM, Robbie Edwards
>> <robbie.edwards at gmail.com> wrote:
>> > Hi all,
>> >
>> > Thanks for the replies, but I realize I've done a bad job explaining my
>> > problem.  To help, I've created some sample data to explain the problem.
>> >
>> > df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), y=c(109,
>> > 232,
>> > 363, 496, 625, 744, 847, 928, 981, 1000, 979, 912), s=c(109, 341, 704,
>> > 1200, 1825, 2569, 3416, 4344, 5325, 6325, 7304, 8216))
>> >
>> > In this data frame, y results from y = x * b1 + x^2 * b2 + x^3 * b3 and
>> > s
>> > is sum of the current y and all previous y (s3 = y1 + y2 + y3).
>> >
>> > I know I can find b1, b2 and b3 using:
>> > lm(y ~ 0 + x + I(x^2) + I(x^3), data=df)
>> >
>> > yielding...
>> > Coefficients:
>> >     x  I(x^2)  I(x^3)
>> >   100      10      -1
>> >
>> > However, I need to find b1, b2 and b3 using the s column.  The reason
>> > being, I don't actually know the values of y in the actual data set.
>> >  And
>> > in the actual data, I only have a few of the values.  Imagine this data
>> > is
>> > being used a reward schedule for like a loyalty points program.  y
>> > represents the number of points needed for each level while s is the
>> > total
>> > number of points to reach that level.  In the real problem, my data
>> > looks
>> > more like this:
>> >
>> > d <- data.frame(x=c(1, 4, 9, 12), s=c(109, 1200, 5325, 8216))
>> >
>> > Where I need to use a few sample points to help define the parameters of
>> > the curve.
>> >
>> > thanks again and hopefully this makes the problem a bit clearer.
>> >
>> > robbie
>> >
>> >
>> >
>> > On Fri, May 18, 2012 at 7:40 PM, David Winsemius
>> > <dwinsemius at comcast.net>wrote:
>> >
>> >>
>> >> On May 18, 2012, at 1:44 PM, Robbie Edwards wrote:
>> >>
>> >>  Hi all,
>> >>>
>> >>> I'm trying to model some data where the y is defined by
>> >>>
>> >>> y = summation[1 to 50] B1 * x + B2 * x^2 + B3 * x^3
>> >>>
>> >>> Hopefully that reads clearly for email.
>> >>>
>> >>>
>> >> cumsum( rowSums( cbind(B1 * x,  B2 * x^2, B3 * x^3)))
>> >>
>> >>
>> >>
>> >>  Anyway, if it wasn't for the summation, I know I would do it like this
>> >>>
>> >>> lm(y ~ x + x2 + x3)
>> >>>
>> >>> Where x2 and x3 are x^2 and x^3.
>> >>>
>> >>> However, since each value of x is related to the previous values of x,
>> >>> I
>> >>> don't know how to do this.  Any help is greatly appreciated.
>> >>>
>> >>>
>> >>>
>> >>
>> >> David Winsemius, MD
>> >> West Hartford, CT
>> >>
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list