[R] Correlation question
Stephane Vaucher
vauchers at iro.umontreal.ca
Thu Sep 9 04:23:54 CEST 2010
Hi everyone,
First of all, thanks for the quick responses. I appreciate the help.
Before answering questions, I wanted to mention that I tested this
behaviour on 2.3.1 and 2.10.1 on a x86_64 linux arch, and on version 2.9.0
on a 32 bit arch.
Now for the answers (batch version):
1/ I received another message stating that I mislabeled the data in my
previous message. That was a retranscription error. I have included a
sample of the data causing the problem
2/ On Wed, 8 Sep 2010, Joshua Wiley wrote:
> Does your data have missing values? I am not sure it would change
> anything, but perhaps try adding:
> cor(test2, method = "spearman", use = "pairwise.complete.obs")
Tried, no difference. The specific pairwise comparisons do not contain
missing values (other parts of my data, yes)
3/ From: Kjetil Halvorsen <kjetilbrinchmannhalvorsen at gmail.com>
> Do dput(test2) and copy&paste the output into the email messa<ge.
I had to clean up my data before sending it (had to remove names/emails).
Problems are visible with (spearman) correlations with P3 with H_tot and
HP_tot. In my correlation matrix, these are both: -0.25028783918741
instead of -0.2182876. Most other correlations are however accurate.
Here is the data:
> dput(exp)
> names(exp)
[1] "P1" "P2" "P3" "P4" "P5" "P6"
[7] "P8" "P9" "P10" "P11" "P12" "P7"
[13] "SITE" "Errors" "warnings" "Manual" "Total" "H_tot"
[19] "HP1.1" "HP1.2" "HP1.3" "HP1.4" "HP_tot" "HO1.1"
[25] "HO1.2" "HO1.3" "HO1.4" "HO_tot" "HU1.1" "HU1.2"
[31] "HU1.3" "HU_tot" "HR" "L_tot" "LP1.1" "LP1.2"
[37] "LP1.3" "LP1.4" "LP_tot" "LO1.1" "LO1.2" "LO1.3"
[43] "LO1.4" "LO_tot" "LU1.1" "LU1.2" "LU1.3" "LU_tot"
[49] "LR_tot" "SP_tot" "SP1.1" "SP1.2" "SP1.3" "SP1.4"
[55] "SP_tot.1" "SO1.1" "SO1.2" "SO1.3" "SO1.4" "SO_tot"
[61] "SU1.1" "SU1.2" "SU1.3" "SU_tot" "SR"
> dput(exp)
structure(list(P1 = c(2L, 1L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 2L,
2L, 2L, 4L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 2L),
P2 = c(2L, 2L, 4L, 4L, 1L, 3L, 2L, 4L, 3L, 3L, 2L, 3L, 4L,
1L, 2L, 2L, 4L, 3L, 4L, 1L, 2L, 3L, 2L, 1L, 3L), P3 = c(2L,
2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L, 2L, 2L, 3L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L), P4 = c(1L, 3L, 3L, 4L,
2L, 3L, 1L, 4L, 3L, 3L, 2L, 3L, 4L, 1L, 3L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 1L, 2L), P5 = c(2L, 1L, 4L, 1L, 2L, 2L, 2L,
3L, 3L, 2L, 2L, 3L, 4L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 3L), P6 = c(2L, 2L, 4L, 1L, 2L, 2L, 2L, 2L, 3L, 2L,
2L, 2L, 4L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L
), P8 = c(2L, 2L, 4L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 2L, 2L,
4L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L), P9 = c(4L,
0L, 4L, 0L, 0L, 2L, 3L, 0L, 0L, 4L, 0L, 4L, 3L, 2L, 4L, 0L,
0L, 0L, 3L, 4L, 3L, 0L, 4L, 0L, 3L), P10 = c(3L, 3L, 2L,
2L, 3L, 3L, 0L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L,
2L, 3L, 3L, 2L, 3L, 3L, 3L), P11 = c(1L, 1L, 2L, 2L, 1L,
1L, 0L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), P12 = c(9L, 10L, 6L, 5L, 9L, 8L, 0L,
5L, 4L, 6L, 8L, 7L, 3L, 10L, 7L, 9L, 7L, 6L, 7L, 10L, 8L,
7L, 8L, 9L, 7L), P7 = structure(c(1L, 9L, 7L, 8L, 1L, 3L,
1L, 1L, 5L, 4L, 1L, 1L, 1L, 1L, 1L, 6L, 1L, 1L, 1L, 11L,
10L, 1L, 2L, 1L, 1L), .Label = c("Â ", "Â al inicio de la
págia
conexión.",
"Â al principio no sabes muy bien por donde empezar pero una vez
aclarado es facil",
"Â busqueda diferente alas conocidas", "Â CUando el tÃtulo
contiene dos puntos,no responde.",
"Â Lass,suelen dar facilidades para
buscar lo que se necesita.",
" La verdad es que está un poco confusa la web y, si la persona
que tiene que acceder a ella no es experta, creo que lo tiene difÃcil",
"Â muy complicado, el poder encontrar los citados documentos, en la
UNIVERSIDAD citada,",
" Ninguna", " Que no hay pestañas q
de pelicula",
"Â yo ninguna"), class = "factor"), SITE = c(5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), Errors = c(201L, 201L, 201L,
201L, 201L, 201L, 201L, 201L, 369L, 369L, 369L, 369L, 369L,
369L, 369L, 369L, 369L, 369L, 159L, 159L, 159L, 159L, 159L,
159L, 159L), warnings = c(164L, 164L, 164L, 164L, 164L, 164L,
164L, 164L, 447L, 447L, 447L, 447L, 447L, 447L, 447L, 447L,
447L, 447L, 490L, 490L, 490L, 490L, 490L, 490L, 490L), Manual = c(44L,
44L, 44L, 44L, 44L, 44L, 44L, 44L, 46L, 46L, 46L, 46L, 46L,
46L, 46L, 46L, 46L, 46L, 45L, 45L, 45L, 45L, 45L, 45L, 45L
), Total = c(409L, 409L, 409L, 409L, 409L, 409L, 409L, 409L,
862L, 862L, 862L, 862L, 862L, 862L, 862L, 862L, 862L, 862L,
694L, 694L, 694L, 694L, 694L, 694L, 694L), H_tot = c(11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 140L, 140L, 140L, 140L,
140L, 140L, 140L, 140L, 140L, 140L, 21L, 21L, 21L, 21L, 21L,
21L, 21L), HP1.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), HP1.2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
HP1.3 = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 94L, 94L, 94L,
94L, 94L, 94L, 94L, 94L, 94L, 94L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), HP1.4 = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 5L, 5L, 5L, 5L, 5L,
5L, 5L), HP_tot = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L,
15L, 15L, 15L, 15L, 15L, 15L, 15L), HO1.1 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), HO1.2 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), HO1.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), HO1.4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), HO_tot = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
HU1.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), HU1.2 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), HU1.3 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), HU_tot = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), HR = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), L_tot = c(26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L,
168L, 168L, 168L, 168L, 168L, 168L, 168L, 168L, 168L, 168L,
95L, 95L, 95L, 95L, 95L, 95L, 95L), LP1.1 = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
16L, 16L, 16L, 16L, 16L, 16L, 16L), LP1.2 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), LP1.3 = c(8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 110L, 110L, 110L, 110L, 110L, 110L, 110L, 110L,
110L, 110L, 39L, 39L, 39L, 39L, 39L, 39L, 39L), LP1.4 = c(12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 10L, 10L, 10L, 10L, 10L, 10L, 10L
), LP_tot = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 157L,
157L, 157L, 157L, 157L, 157L, 157L, 157L, 157L, 157L, 65L,
65L, 65L, 65L, 65L, 65L, 65L), LO1.1 = c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 14L,
14L, 14L, 14L, 14L, 14L, 14L), LO1.2 = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), LO1.3 = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), LO1.4 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 4L, 4L, 4L, 4L,
4L, 4L), LO_tot = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 18L, 18L, 18L, 18L, 18L,
18L, 18L), LU1.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), LU1.2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
LU1.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), LU_tot = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), LR_tot = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
5L, 5L, 5L, 5L, 5L, 5L, 5L), SP_tot = c(122L, 122L, 122L,
122L, 122L, 122L, 122L, 122L, 61L, 61L, 61L, 61L, 61L, 61L,
61L, 61L, 61L, 61L, 85L, 85L, 85L, 85L, 85L, 85L, 85L), SP1.1 = c(16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 16L, 16L, 16L, 16L, 16L, 16L, 16L), SP1.2 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SP1.3 = c(23L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 46L, 46L, 46L, 46L, 46L, 46L,
46L, 46L, 46L, 46L, 32L, 32L, 32L, 32L, 32L, 32L, 32L), SP1.4 = c(45L,
45L, 45L, 45L, 45L, 45L, 45L, 45L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), SP_tot.1 = c(85L,
85L, 85L, 85L, 85L, 85L, 85L, 85L, 50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L, 50L, 50L, 46L, 46L, 46L, 46L, 46L, 46L, 46L
), SO1.1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 5L, 5L, 5L, 5L, 5L, 5L, 5L),
SO1.2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SO1.3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SO1.4 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), SO_tot = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L,
6L, 6L, 6L, 6L, 6L), SU1.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), SU1.2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), SU1.3 = c(16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 16L, 16L, 16L, 16L, 16L,
16L, 16L), SU_tot = c(19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 16L, 16L, 16L,
16L, 16L, 16L, 16L), SR = c(17L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 17L, 17L,
17L, 17L, 17L, 17L, 17L)), .Names = c("P1", "P2", "P3", "P4",
"P5", "P6", "P8", "P9", "P10", "P11", "P12", "P7", "SITE", "Errors",
"warnings", "Manual", "Total", "H_tot", "HP1.1", "HP1.2", "HP1.3",
"HP1.4", "HP_tot", "HO1.1", "HO1.2", "HO1.3", "HO1.4", "HO_tot",
"HU1.1", "HU1.2", "HU1.3", "HU_tot", "HR", "L_tot", "LP1.1",
"LP1.2", "LP1.3", "LP1.4", "LP_tot", "LO1.1", "LO1.2", "LO1.3",
"LO1.4", "LO_tot", "LU1.1", "LU1.2", "LU1.3", "LU_tot", "LR_tot",
"SP_tot", "SP1.1", "SP1.2", "SP1.3", "SP1.4", "SP_tot.1", "SO1.1",
"SO1.2", "SO1.3", "SO1.4", "SO_tot", "SU1.1", "SU1.2", "SU1.3",
"SU_tot", "SR"), class = "data.frame", row.names = c(NA, -25L
))
I don't know if this could be a factor, but my data contains some weird
characters.
Thanks everyone,
Stephane Vaucher
On Wed, Sep 8, 2010 at 12:35 PM, Stephane Vaucher
<vauchers at iro.umontreal.ca> wrote:
> Hi everyone,
>
> I'm observing what I believe is weird behaviour when attempting to do
> something very simple. I want a correlation matrix, but my matrix seems to
> contain correlation values that are not found when executed on pairs:
>
>> test2$P2
>
> [1] 2 2 4 4 1 3 2 4 3 3 2 3 4 1 2 2 4 3 4 1 2 3 2 1 3
>>
>> test2$HP_tot
>
> [1] 10 10 10 10 10 10 10 10 136 136 136 136 136 136 136 136 136 136
> 15
> [20] 15 15 15 15 15 15 c=cor(test2$P3,test2$HP_tot,method='spearman')
>>
>> c
>
> [1] -0.2182876
>>
>> c=cor(test2,method='spearman')
>
> Warning message:
> In cor(test2, method = "spearman") : the standard deviation is zero
>>
>> write(c,file='out.csv')
>
> from my spreadsheet
> -0.25028783918741
>
> Most cells are correct, but not that one.
>
> If this is expected behaviour, I apologise for bothering you, I read the
> documentation, but I do not know if the calculation of matrices and pairs is
> done using the same function (eg, with respect to equal value observations).
>
> If this is not a desired behaviour, I noticed that it only occurs with a
> relatively large matrix (I couldn't reproduce on a simple 2 column data
> set). There might be a naming error.
>
>> names(test2)
>
> [1] "ID" "NOMBRE" "MAIL"
> [4] "Age" "SEXO" "Studies"
> [7] "Hours_Internet" "Vision.Disabilities" "Other.disabilities"
> [10] "Technology_Knowledge" "Start_Time" "End_Time"
> [13] "Duration" "P1" "P1Book"
> [16] "P1DVD" "P2" "P3"
> [19] "P4" "P5" "P6"
> [22] "P8" "P9" "P10"
> [25] "P11" "P12" "P7"
> [28] "SITE" "Errors" "warnings"
> [31] "Manual" "Total" "H_tot"
> [34] "HP1.1" "HP1.2" "HP1.3"
> [37] "HP1.4" "HP_tot" "HO1.1"
> [40] "HO1.2" "HO1.3" "HO1.4"
> [43] "HO_tot" "HU1.1" "HU1.2"
> [46] "HU1.3" "HU_tot" "HR"
> [49] "L_tot" "LP1.1" "LP1.2"
> [52] "LP1.3" "LP1.4" "LP_tot"
> [55] "LO1.1" "LO1.2" "LO1.3"
> [58] "LO1.4" "LO_tot" "LU1.1"
> [61] "LU1.2" "LU1.3" "LU_tot"
> [64] "LR_tot" "SP_tot" "SP1.1"
> [67] "SP1.2" "SP1.3" "SP1.4"
> [70] "SP_tot.1" "SO1.1" "SO1.2"
> [73] "SO1.3" "SO1.4" "SO_tot"
> [76] "SU1.1" "SU1.2" "SU1.3"
> [79] "SU_tot" "SR"
>
> Thank you in advance,
> Stephane Vaucher
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list