# [R] linear correlation?

Andrew Perrin andrew_perrin at unc.edu
Thu Mar 7 16:41:55 CET 2002

```On Thu, 7 Mar 2002, [iso-8859-1] dechao wang wrote:

>  --- Andrew Perrin <andrew_perrin at unc.edu> wrote: > On
> Thu, 7 Mar 2002, [iso-8859-1] dechao wang wrote:
> >
> > > Thanks Andrew,
> > >
> > > Consider the following example:
> > > > x1<-c(1,  2,  3,   100, 200, 300)
> > > > x2<-c(1.1,2.8,3.3, 108, 209, 303)
> > > > x3<-c(2.8,3.8,5.3, 108, 209, 303)
> > > > cor(x1,x2)
> > > [1] 0.999655
> > > > cor(x1,x3)
> > > [1] 0.9997286
> > >
> > > You can see that as x2 changed to x3 with only
> > first
> > > three numbers changing, the coefficients (x1, x2)
> > and
> > > (x1,x3) changed little. I thought this may be
> > because
> > > the last three numbers were in different units.
> >
> > It's not because they're different units -- it's
> > because they're different
> > measures altogether! Can you state, in words (e.g.,
> > not in mathematical
> > terms) what you think a correlation would *mean*
> > between these two
> > vectors?  R is happily telling you,
as any
> > statistical package would, what
> > the correlation is between two vectors of numbers.
> > But that correlation
> > doesn't necessarily mean anything at all; its
> > meaning is based on what the
> > vectors measure.
> >
>
> There are lots of examples. Let us consider the first
> three numbers representing three branches of an apple
> tree, the last three numbers representing the
> corresponding branching angles of the branches. So x1,
> x2, x3 represents three different trees. Maybe we can
> ask which tree is similar to which tree?
>

In which case you probably shouldn't be storing the data in vectors
(although you can), but you certainly shouldn't be using correlations to
measure similarity among vectors where each vector represents one unit of
analysis.  There are various ways of classifying the "similarity" among
vectors (indeed, Brian Ripley of Venables and Ripley fame is an expert in
this field) but correlation is not one of them.

You could ask, in your example, whether the length of a branch is
correlated with its angle; in that case, you'd want something like:
x1<-c(1,  2,  3,   100, 200, 300)
x2<-c(1.1,2.8,3.3, 108, 209, 303)
x3<-c(2.8,3.8,5.3, 108, 209, 303)
x.df<-as.data.frame(t(data.frame(x1,x2,x3)))
colnames(x.df)<-c('l1','l2','l3','a1','a2','a3')attach(x.df)
cor(l1,a1)

which returns:
[1] 0.5421936

or the correlation between length 1 (l1) and angle 1 (a1). That's a
suitable (although not very sophisticated) use of correlation. But
measuring the correlation between cases using different measures is not a
useful, or even meaningful, exercise, IMNSHO.

----------------------------------------------------------------------
Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```