[R] Line similarity
William Dunlap
wdunlap at tibco.com
Tue Apr 30 22:47:58 CEST 2013
Here is one way to, for each row in the data.frame v, regress the numbers in
columns 2 through 4 on the numbers 1 through 3, storing only the slopes, and
then creating a column saying if the slope is greater than zero or not.
> v[,"Beta"] <- vapply(seq_len(nrow(v)),
FUN=function(i)coef(lm(value~year, data=data.frame(value=as.numeric(v[i,2:4]), year=seq_len(3))))[2],
FUN.VALUE=0)
> v[,"Growing"] <- v[,"Beta"] > 0
> v
Name Year_1_value Year_2_value Year_3_value Beta Growing
1 A 1 2 3 1.0 TRUE
2 B 2 7 19 8.5 TRUE
3 C 3 4 2 -0.5 FALSE
4 D 10 7 6 -2.0 FALSE
5 E 4 4 5 0.5 TRUE
6 F NA 3 6 3.0 TRUE
Since you are doing least-squares regression in which the predictors are the
same for all regressions (expect the one with the NA in it) you can also do
> coef(lm(value ~ year, list(value=t(as.matrix(v[1:5,2:4])), year=seq_len(3))))[2,]
1 2 3 4 5
1.0 8.5 -0.5 -2.0 0.5
but you have to then make a special case for each pattern of missing values.
If you always use a 3-consecutive-year period you can use
Growing <- v[,"Year_1_value"] < v[, "Year_3_value"]
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Satsangi, Vivek (GE Capital)
> Sent: Tuesday, April 30, 2013 12:57 PM
> To: r-help at r-project.org
> Subject: [R] Line similarity
>
> Folks,
>
> This is probably a "help me google this properly, please"-type of question.
>
> In TIBCO Spotfire, there is a procedure called "line similarity". I use this to
> determine which observations show a growing, stable or declining pattern... sort of like a
> mini-regression on the time-line for each observation.
>
> So of the input is something like this:
>
> Name Year_1_value Year_2_value Year_3_value
> A 1 2 3
> B 2 7 19
> C 3 4 2
> D 10 7 6
> E 4 4 5
> F NA 3 6
>
> Then the desired output is as follows:
> A Growing
> B Growing
> C Stable
> D Declining
> E Stable
> F Growing (or NA is also fine)
>
> The data can also be unstacked, i.e. the three years could be separate rows if
> necessary.
> Is there a package for R that implements something like the above? I can
> obviously try do a set of simple regressions to classify the rows, but I want to gain from
> the thoughts and learnings of others who may have taken the time to implement a
> package.
> I tried searching with the words "line similarity" or its variants to no avail.
>
> Thanks in advance for your pointers!
>
> Vivek Satsangi
> GE Capital
> Americas
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list