[R] Create counter variable for subsets without a loop
Gabor Grothendieck
ggrothendieck at gmail.com
Tue May 18 13:39:56 CEST 2010
Here are four solutions:
data <- cbind(state.region,as.data.frame(state.x77))[,1:2]
# ave
data2 <- data[order(data$state.region, -data$Population), ]
data2$rank <- ave(data2$Population, data2$state.region, FUN = seq_len))
# by
f <- function(x) cbind(x[order(-x$Population), ], rank = 1:nrow(x))
do.call("rbind", by(data, data$state.region, f))
# ddply - same f as in by solution
library(plyr)
ddply(data, .(state.region), f)
# sqldf with PostgreSQL
library(RpgSQL)
library(sqldf)
sqldf('select
*, rank() over (partition by "state.region" order by "Population" desc)
from data
order by "state.region", "Population" desc')
On Mon, May 17, 2010 at 5:32 PM, Thomas Brambor <tbrambor at stanford.edu> wrote:
> Hi all,
>
> I am looking to create a rank variable based on a continuous variable
> for subsets of the data. For example, for an R integrated data set
> about US states this is how a loop could create what I want:
>
> ### Example with loop
> data <- cbind(state.region,as.data.frame(state.x77))[,1:2] #
> choosing a subset of the data
> data <- data[order(data$state.region, 1/data$Population),] #
> ordering the data
> regions <- levels(data$state.region)
> temp <- NULL
> ranks <- NULL
> for (i in 1:length(regions)){
> temp <- rev(rank(data[data$state.region==regions[i],"Population"]))
> ranks <- c(ranks,temp)
> }
> data$rank <- ranks
> data
>
> where data$rank is the rank of the state by population within a region.
>
> However, using loops is slow and cumbersome. I have a fairly large
> data set with many subgroups and the loop runs a long time. Can
> someone suggest a way to create such rank variable for subsets without
> using a loop?
>
> Thank you,
> Thomas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list