dd [Was: Re: [R] Selection of cities sample]
Matej Cepl
cepl at surfbest.net
Sun Apr 25 04:43:11 CEST 2004
On Friday 23 of April 2004 04:37, Matej Cepl wrote:
> I have a question, how to most properly select set of cities
> which would be as similar as possible in some particular
> variables with the City of Boston (which I use as my base
> line).
Hi,
how to weigh variables used in daisy function? After week spent
with MASS, Crawley (2002), and Gordon (1999), I finished with
this function (which is actually not a real function but just
convenient packaging of one complex expression):
function(x) {
require(cluster)
return(hclust(daisy(
as.matrix(x),
metric="euclidean",
stand=TRUE),
method="average")
)
}
When plotting this I got a huge tree (available in PDF on http://
www.ceplovi.cz/matej/tmp/mctree.pdf), which seems to be very
helpful, because by selecting particular cluster I get my group
of cities to use as a sample. Would anybody be so kind and
comment on this code, please?
Now, I would love to weigh some variables in a dataframe used for
calculation (because I am more concerned with some variables
more than with others, which should be included with lower
weigh). In help("daisy") I found this:
If 'nok' is the number of nonzero weights, the
dissimilarity is multiplied by the factor '1/nok' and thus
ranges between 0 and 1.
Do I understand correctly that this allows weighing of
non-interval (non-continuous) variables? If yes, how can weigh
variables, which are interval (whole my table is from counts and
two percent variables)?
Thanks for any reply,
Matej Cepl
--
Matej Cepl, http://www.ceplovi.cz/matej
GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC
138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488
Just remember, brothers and sisters--their skins may be white,
but their souls are just as black as ours!
-- a black preacher
More information about the R-help
mailing list