# [R] Scatterplot and Causality

Duncan Murdoch murdoch.duncan at gmail.com
Mon Apr 22 17:00:20 CEST 2013

```On 22/04/2013 10:48 AM, Lorenzo Isella wrote:
> Dear All,
> I hope this is not too off topic.
> I am given a set of scatteplots (nothing too fancy; think about a
> normal x-y 2D plot).
> I do not deal with two time series (indeed I have no info about time).
> If I call A=(A1,A2,...) and B=(B1, B2, ...) the 2 variables (two
> vectors of numbers most of the case, but sometimes they can be
> categorical variables), I can plot one against the other and I
> essentially I need to determine whether
>
> A=f(B, noise) or B=g(A, noise)
>
> where the noise is the effect of other possibly unknown variables,
> measurement errors etc.... and f and g are two functions.
>
> Without the noise, if I want to test if A=f(B) [B causes A], then I
> need at least to ensure that f(B1)!=f(B2) must imply B1!=B2 (different
> effects must have a different cause), whereas it is not ruled out that
> f(B1)=f(B2) for B1!=B2 (different causes may lead to the same effect).
>
> However, in presence of the noise, these properties will hold only
> approximately so....any idea about how a statistical test, rather than
> eyeballing, to tell apart A=f(B, noise) vs B=g(A, noise)?
> Any suggestion is welcome.

In general there can't be such a test.  Think about the case of simple
linear regression.  If I randomly draw X from a normal distribution,
then randomly draw Y_i = a + b X_i + e_i, where the e_i are drawn from
an independent normal distribution, I end up with (X,Y) having a
bivariate normal distribution.

In your notation, X would cause Y, but there is *nothing* here to
distinguish this from draws directly from the bivariate normal
distribution, or draws of Y first, followed by X from its conditional
distribution (which is also a linear regression model).

With some extra information inference might be possible, but not in the