[R] cluster analysis with pairwise data
Petr Savicky
savicky at cs.cas.cz
Wed Apr 4 18:12:43 CEST 2012
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
> Hello,
> I want to do a cluster analysis with my data. The problem is, that the
> variables dont't consist of single value but the entries are pairs of
> values.
> That lokks like this:
>
>
> Variable 1: Variable2: Variable3: . . .
> (1,2) (1,5) (4,2)
> (7,8) (3,88) (6,5)
> (4,7) (12,4) (4,4)
> . . .
> . . .
> . . .
> Is it possible to perform a cluster-analysis with this kind of data in
> R ?
> I dont even know how to get this data in a matrix or a dada-frame or
> anything like this.
Hi.
The data as they are may be read into R as character data. The
exact way depends on the format of the data in the file. The
result may look like the following.
Var1 <- c("(1,2)", "(7,8)", "(4,7)")
Var2 <- c("(1,5)", "(3,88)", "(12,4)")
Var3 <- c("(4,2)", "(6,5)", "(4,4)")
DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)
If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format. For example, as follows
trans <- function(x)
{
y <- strsplit(gsub("[()]", "", x), ",")
unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
}
DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
DF
Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
1 1 2 1 5 4 2
2 7 8 3 88 6 5
3 4 7 12 4 4 4
Then, see library(help=cluster).
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list