# [R] simple intro to cluster analysis using R

Donatas G. dgvirtual at akl.lt
Thu Apr 10 01:00:57 CEST 2008

```I am looking for simple introduction to cluster analysis using R, that would
be understandable to a novice in statistics. Or, could someone perhaps help
me understand how to proceed in my analysis? I am very new to both statistics
and R, but am trying hard to avoid having to use SPSS as everyone around
me...

I have dataset on people presenting their opinions on different religious
communities coded on 5 point scale, and I want to see if those communities
can be grouped (clustered) in some way that would be illuminatin for my
research purposes.

So, I have data that looks like this:

> describe(R12)
R12

18  Variables      1035  Observations
---------------------------------------------------------------------------
R12.1
n missing  unique
416     619       5

More negative (51, 12%), More positive (112, 27%)
Completely negative (41, 10%), Completely positive (23, 6%)
Neutral (189, 45%)

<skip>

R12.12
n missing  unique
451     584       5

More negative (111, 25%), More positive (43, 10%)
Completely negative (79, 18%), Completely positive (5, 1%)
Neutral (213, 47%)

<and so on>

So you can see there is a lot (more than half) at times NA's in this
questionnairre.

Here is also a correlation matrix (only part is displayed):

> x=cor(R12, use="pairwise.complete.obs")
> round(x,2)
R12.1 R12.2 R12.3 R12.4 R12.5 R12.6 R12.7 R12.8 R12.9 R12.10 R12.11
R12.1   1.00  0.57  0.57  0.61  0.57  0.48  0.43  0.38  0.52   0.58   0.58
R12.2   0.57  1.00  0.82  0.78  0.73  0.62  0.43  0.49  0.64   0.69   0.75
R12.3   0.57  0.82  1.00  0.89  0.90  0.73  0.54  0.57  0.70   0.77   0.78
R12.4   0.61  0.78  0.89  1.00  0.91  0.68  0.51  0.56  0.65   0.80   0.76
R12.5   0.57  0.73  0.90  0.91  1.00  0.73  0.53  0.55  0.68   0.78   0.74
R12.6   0.48  0.62  0.73  0.68  0.73  1.00  0.59  0.62  0.68   0.79   0.78
R12.7   0.43  0.43  0.54  0.51  0.53  0.59  1.00  0.62  0.55   0.65   0.65
R12.8   0.38  0.49  0.57  0.56  0.55  0.62  0.62  1.00  0.55   0.65   0.62
R12.9   0.52  0.64  0.70  0.65  0.68  0.68  0.55  0.55  1.00   0.79   0.82
R12.10  0.58  0.69  0.77  0.80  0.78  0.79  0.65  0.65  0.79   1.00   0.88
R12.11  0.58  0.75  0.78  0.76  0.74  0.78  0.65  0.62  0.82   0.88   1.00
R12.12  0.47  0.59  0.64  0.65  0.60  0.61  0.56  0.50  0.68   0.77   0.83
R12.13  0.62  0.69  0.77  0.70  0.74  0.76  0.65  0.61  0.78   0.81   0.82
R12.14  0.58  0.70  0.71  0.75  0.70  0.74  0.64  0.62  0.78   0.86   0.86
R12.15  0.58  0.61  0.72  0.72  0.71  0.72  0.64  0.59  0.73   0.83   0.79
R12.16  0.56  0.67  0.77  0.72  0.78  0.75  0.57  0.54  0.75   0.85   0.80
R12.17  0.61  0.69  0.79  0.77  0.75  0.73  0.56  0.57  0.74   0.82   0.80
R12.18  0.63  0.73  0.84  0.82  0.83  0.71  0.54  0.64  0.68   0.71   0.74

so you can see there is a lot of correlation in the opinions. I doubt
clusterization would be meaningfull, but I still want to try.

How do I proceed with this?

--
Donatas Glodenis

```