[R] Assigning factors probabilistically based on the value of another variable.

Charles C. Berry cberry at tajo.ucsd.edu
Sun Mar 28 00:24:03 CET 2010


On Sat, 27 Mar 2010, Economics Guy wrote:

> I am revising a program that I wrote when I was very new at R
> (2007ish), and while I have been able to write very nice and fast code
> for almost all of it, there is one issue that I cannot seem to do it
> in less than 40 ugly and computationally expensive lines.
>
> I have a data frame that contains one variable:
>
> theFrame <- data.frame(theValues=runif(150,-10,10))
>
> I would like to write a function that would assign each of these
> values a factor, and I need it to meet several criteria:
>
> (1) There are 15 factors.
> (2) I need there to be exactly 10 elements assigned to each factor.
>
> Now here is the tricky part:
>
> (3) I would like to assign the factor probabilistically. The lower
> theValue is for a row, the lower factor I would like it to receive. So
> values close to -10 should have a really high probability of being
> assigned factor 1.
>
> If assigning factors is to tricky I would settle for placing theValues
> in a 10 x 15 matrix where the lower values would be more likely to end
> up in column 1 (again, values close to -10 should have a really high
> probability of being assigned to column 1.).

It is really the same thing. One of many possibilities:

> theFrame <- data.frame(theValues=runif(150,-10,10))
> exact <- diag(15)[1+ (rank(theFrame$theValues)-1)%/%10,]
> not.so.exact <- diag(15)[1+ (rank(theFrame$theValues+runif(150,0,3))-1)%/%10,]

If what you actually wanted was one factor with fifteen levels, just wrap 
the subscript in the last assignment in factor() instead.

HTH,

Chuck

>
> Any ideas? I have thought at times I was painfully close only to
> realize I was completely wrong.
>
> Thanks,
>
> That Economics Guy
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list