[R] Help me replace a for loop with an "apply" function

gd047 gd047 at mineknowledge.com
Thu Oct 1 20:57:44 CEST 2009


Congratulations!

Could you explain to me the reason you add an initial "TRUE" value in the
cumulatice sum?



jholtman wrote:
> 
> Will this work:
> 
>> x <- read.table(textConnection("   day         user_id
> + 2008/11/01    2001
> + 2008/11/01    2002
> + 2008/11/01    2003
> + 2008/11/01    2004
> + 2008/11/01    2005
> + 2008/11/02    2001
> + 2008/11/02    2005
> + 2008/11/03    2001
> + 2008/11/03    2003
> + 2008/11/03    2004
> + 2008/11/03    2005
> + 2008/11/04    2001
> + 2008/11/04    2003
> + 2008/11/04    2004
> + 2008/11/04    2005"), header=TRUE)
>> closeAllConnections()
>> # convert to Date
>> x$day <- as.Date(x$day, format="%Y/%m/%d")
>> # split by user and then look for contiguous days
>> contig <- sapply(split(x$day, x$user_id), function(.days){
> +     .diff <- cumsum(c(TRUE, diff(.days) != 1))
> +     max(table(.diff))
> + })
>> contig
> 2001 2002 2003 2004 2005
>    4    1    2    2    4
>>
>>
> 
> 
> On Thu, Oct 1, 2009 at 11:29 AM, gd047 <gd047 at mineknowledge.com> wrote:
>>
>> ...if that is possible
>>
>> My task is to find the longest streak of continuous days a user
>> participated
>> in a game.
>>
>> Instead of writing an sql function, I chose to use the R's rle function,
>> to
>> get the longest streaks and then update my db table with the results.
>>
>> The (attached) dataframe is something like this:
>>
>>    day         user_id
>> 2008/11/01    2001
>> 2008/11/01    2002
>> 2008/11/01    2003
>> 2008/11/01    2004
>> 2008/11/01    2005
>> 2008/11/02    2001
>> 2008/11/02    2005
>> 2008/11/03    2001
>> 2008/11/03    2003
>> 2008/11/03    2004
>> 2008/11/03    2005
>> 2008/11/04    2001
>> 2008/11/04    2003
>> 2008/11/04    2004
>> 2008/11/04    2005
>>
>>
>>
>> --- R code follows
>> ------------------------------------------------------
>>
>>
>> # turn it to a contingency table
>> my_table <- table(user_id, day)
>>
>> # get the streaks
>> rle_table <- apply(my_table,1,rle)
>>
>> # verify the longest streak of "1"s for user 2001
>> # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values,
>> max)["1"])
>>
>> # loop to get the results
>> # initiate results matrix
>> res<-matrix(nrow=dim(my_table)[1], ncol=2)
>>
>> for (i in 1:dim(my_table)[1]) {
>> string <- paste("as.vector(tapply(rle_table$'", rownames(my_table)[i],
>> "'$lengths, rle_table$'", rownames(my_table)[i], "'$values, max)['1'])",
>> sep="")
>> res[i,]<-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
>> }
>>
>>
>> ----------------------------------------------------
>> --- end of R code
>>
>> Unfortunately this for loop takes too long and I' wondering if there is a
>> way to produce the res matrix using a function from the "apply" family.
>>
>> Thank you in advance
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list