[R] help with mysql and R: partitioning by quintile
jim holtman
jholtman at gmail.com
Sun May 8 23:42:06 CEST 2011
try this:
> # create some data
> x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')
+ , track = rep(1:20, 20)
+ , freq = floor(runif(400, 10, 200))
+ , stringsAsFactors = FALSE
+ )
> # get the quantiles for each track
> tq <- tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1))
> # create a matrix with the rownames as the tracks to use in the findInterval
> tqm <- do.call(rbind, tq)
> # now put the ratings
> require(data.table)
> x.dt <- data.table(x)
> x.new <- x.dt[,
+ list(userid = userid
+ , freq = freq
+ , rating = findInterval(freq
+ # use track as index into
quantile matrix
+ , tqm[as.character(track[1L]),]
+ , rightmost.closed = TRUE
+ ) + 1L
+ )
+ , by = track]
>
> head(x.new)
track userid freq rating
[1,] 1 u1 10 1
[2,] 1 u2 15 1
[3,] 1 u3 126 4
[4,] 1 u4 117 3
[5,] 1 u5 76 2
[6,] 1 u6 103 3
>
On Sun, May 8, 2011 at 2:48 PM, gj <gawesh at gmail.com> wrote:
> Hi,
>
> I have a mysql table with fields userid,track,frequency e.g
> u1,1,10
> u1,2,100
> u1,3,110
> u1,4,200
> u1,5,120
> u1,6,130
> .
> u2,1,23
> .
> .
> where "frequency" is the number of times a music track is played by a
> "userid"
>
> I need to turn my 'frequency' table into a rating table (it's for a
> recommender system). So, for each user, I need to categorise the frequency
> of tracks played by quintile so that each particular track can have 5
> ratings (1-5), with the ratings allocated as follows: inter-quintile range
> 100-80% = rating 5, inter-quintile range 80-60% = rating 4,
> ..., inter-quintile range 20-0% = rating 1)
>
> Hence, I need to create a table with fields userid,track,rating:
> u1,1,1
> u1,2, 3
> ...
>
> Can anybody help me to do this with R?
>
> Regards
> Gawesh
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list