[R] Advice: How to best ensure column values match in different vectors?
R. Michael Weylandt
michael.weylandt at gmail.com
Thu Aug 9 06:32:46 CEST 2012
On Wed, Aug 8, 2012 at 10:58 AM, DG Christensen <dgc at enservio.com> wrote:
> Hello all, I would like some advice on how to order elements in a vector.
>
> Background: my company is running a k-means clustering model on our
> historical data warehouse of products, which will produce a matrix of
> cluster centers. Then, on our production web servers, we will take
> newly created products and find the cluster that is closest to the new
> product (we're calling this "scoring" the product). Simple stuff. The
> complex part is that the data source for the model is different from the
> source of the new product.
>
> My concern is how to best ensure that the order of the product
> attributes in the clustering model matches the attributes of the new
> product vector. Here's what I'm considering doing:
>
> Say my company keeps the attributes height, width, and length on our
> products (in reality we'll have over 200 attributes). I will create a
> constant of the column (i.e. attribute) names:
>
> PRODUCT.ATTRIBUTE.COLS <- c("H","W","L")
> PRODUCT.ATTRIBUTE.COUNT <- length( PRODUCT.ATTRIBUTE.COLS )
>
> All new vectors (both during modeling and scoring) will be created with
> NaN values:
>
> product.vector <- rep(NaN, PRODUCT.ATTRIBUTE.COUNT)
> names( product.vector ) <- PRODUCT.ATTRIBUTE.COLS
>
> The vector will then be populated with attribute values like this. The
> values will be retrieved from whatever DB we're using:
>
> product.vector["H"] <- height.from.db
> product.vector["W"] <- width.from.db
> product.vector["L"] <- length.from.db
>
> Is this a reasonable way to do this? If so, one thing I'd like to add
> is error checking that validates that the attribute name exists, so if
> the code attempted to do:
>
> product.vector["WEIGHT"] <- weight.from.db
>
> it would throw some sort of error. What's the best way for handling
> that? Can I set the length of the vector to a fixed size?
Hi DG,
You can define your own class which errors out when accessing names
which don't exist:
E.g.,
as.strictvec <- function(x){
stopifnot(is.atomic(x))
class(x) <- c("strictvec", class(x))
x
}
`[<-.strictvec` <- function(x, i, j, value){
stopifnot(j %in% colnames(x))
NextMethod()
}
z <- matrix(1:3, ncol = 3); colnames(z) <- letters[1:3]
z.strict <- as.strictvec(z)
z[, "d"] <- 5
z.strict[, "d"] <- 5 # Error!
Adapt as needed.
Cheers,
Michael
>
> Thanks for any guidance,
> DG
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list