[R] help with mysql and R: partitioning by quintile

jim holtman jholtman at gmail.com
Sun May 8 23:42:06 CEST 2011


try this:

> # create some data
> x <- data.frame(userid = paste('u', rep(1:20, each = 20), sep = '')
+               , track = rep(1:20, 20)
+               , freq = floor(runif(400, 10, 200))
+               , stringsAsFactors = FALSE
+               )
> # get the quantiles for each track
> tq <- tapply(x$freq, x$track, quantile, prob = c(.2, .4, .6, .8, 1))
> # create a matrix with the rownames as the tracks to use in the findInterval
> tqm <- do.call(rbind, tq)
> # now put the ratings
> require(data.table)
> x.dt <- data.table(x)
> x.new <- x.dt[,
+               list(userid = userid
+                  , freq = freq
+                  , rating = findInterval(freq
+                                        # use track as index into
quantile matrix
+                                        , tqm[as.character(track[1L]),]
+                                        , rightmost.closed = TRUE
+                                        ) + 1L
+                  )
+              , by = track]
>
> head(x.new)
     track userid freq rating
[1,]     1     u1   10      1
[2,]     1     u2   15      1
[3,]     1     u3  126      4
[4,]     1     u4  117      3
[5,]     1     u5   76      2
[6,]     1     u6  103      3
>


On Sun, May 8, 2011 at 2:48 PM, gj <gawesh at gmail.com> wrote:
> Hi,
>
> I have a mysql table with fields userid,track,frequency e.g
> u1,1,10
> u1,2,100
> u1,3,110
> u1,4,200
> u1,5,120
> u1,6,130
> .
> u2,1,23
> .
> .
> where "frequency" is the number of times a music track is played by a
> "userid"
>
> I need to turn my 'frequency' table into a rating table (it's for a
> recommender system). So, for each user, I need to categorise the frequency
> of tracks played by quintile so that each particular track can have 5
> ratings (1-5), with the ratings allocated as follows: inter-quintile range
> 100-80% = rating 5,   inter-quintile range 80-60% = rating 4,
> ..., inter-quintile range 20-0% = rating 1)
>
> Hence, I need to create a table with fields userid,track,rating:
> u1,1,1
> u1,2, 3
> ...
>
> Can anybody help me to do this with R?
>
> Regards
> Gawesh
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list