[R] Create counter variable for subsets without a loop

Tue May 18 13:39:56 CEST 2010

Here are four solutions:

data <- cbind(state.region,as.data.frame(state.x77))[,1:2]

# ave
data2 <- data[order(data$state.region, -data$Population), ]
data2$rank <- ave(data2$Population, data2$state.region, FUN = seq_len))

# by
f <- function(x) cbind(x[order(-x$Population), ], rank = 1:nrow(x))
do.call("rbind", by(data, data$state.region, f))

# ddply - same f as in by solution
library(plyr)
ddply(data, .(state.region), f)

# sqldf with PostgreSQL
library(RpgSQL)
library(sqldf)
sqldf('select
   *, rank() over (partition by "state.region" order by "Population" desc)
   from data
   order by "state.region", "Population" desc')

On Mon, May 17, 2010 at 5:32 PM, Thomas Brambor <tbrambor at stanford.edu> wrote:
> Hi all,
>
> I am looking to create a rank variable based on a continuous variable
> for subsets of the data. For example, for an R integrated data set
> about US states this is how a loop could create what I want:
>
> ### Example with loop
> data <- cbind(state.region,as.data.frame(state.x77))[,1:2]     #
> choosing a subset of the data
> data <- data[order(data$state.region, 1/data$Population),]    #
> ordering the data
> regions <- levels(data$state.region)
> temp <- NULL
> ranks <- NULL
> for (i in 1:length(regions)){
>    temp <- rev(rank(data[data$state.region==regions[i],"Population"]))
>    ranks <- c(ranks,temp)
>  }
> data$rank <- ranks
> data
>
> where data$rank is the rank of the state by population within a region.
>
> However, using loops is slow and cumbersome. I have a fairly large
> data set with many subgroups and the loop runs a long time. Can
> someone suggest a way to create such rank variable for subsets without
> using a loop?
>
> Thank you,
> Thomas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>