[R] by group problem
Petr PIKAL
petr.pikal at precheza.cz
Mon Sep 3 10:51:13 CEST 2007
Hi
now I understand better what you want
topN.2 <- function(data,n=5) data[order(data[,3], decreasing=T),][1:n]
# I presume data is data frame with 3 columns and the third is percent
lapply(split(data,data$state), topN.2)
Regards
Petr
petr.pikal at precheza.cz
"Cory Nissen" <cnissen at AkoyaInc.com> napsal dne 31.08.2007 17:21:01:
> That didn't work for me...
>
> Here's some data to help with a solution.
>
> data <- NULL
> data$state <- c(rep("Illinois", 10), rep("Wisconsin", 10))
> data$county <- c("Adams", "Brown", "Bureau", "Cass", "Champaign",
> "Christian", "Coles", "De Witt", "Douglas", "Edgar",
> "Adams", "Ashland", "Barron", "Bayfield", "Buffalo",
> "Burnett", "Chippewa", "Clark", "Columbia", "Crawford")
> data$percentOld <- c(17.554849, 16.826594, 18.196593, 17.139242,
8.743823,
> 17.862746, 13.747967, 16.626302, 15.258940,
18.984435,
> 19.347022, 17.814436, 16.903067, 17.632781,
16.659305,
> 20.337817, 14.293354, 17.252820, 15.647179,
16.825596)
>
> return something like this...
> $Illinois
> "Edgar"
> 18.984435
> "Bureau"
> 18.196593
> ...
> $Wisconsin
> "Burnett"
> 20.33782
> "Adams"
> 19.34702
> ...
>
> My Solution gives...
> topN <- function(column, n=5)
> {
> column <- sort(column, decreasing=T)
> return(column[1:n])
> }
> tapply(data$percentOld, data$state, topN)
>
> $Illinois
> [1] 18.98444 18.19659 17.86275 17.55485 17.13924
> $Wisconsin
> [1] 20.33782 19.34702 17.81444 17.63278 17.25282
>
> I get an error with this try...
> aggregate(data$percentOld, list(data$state, data$county), topN)
>
> Error in aggregate.data.frame(as.data.frame(x), ...) :
> 'FUN' must always return a scalar
>
> Thanks
>
> cn
>
>
>
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Sent: Fri 8/31/2007 8:15 AM
> To: Cory Nissen
> Cc: r-help at stat.math.ethz.ch
> Subject: Odp: [R] by group problem
> Hi
>
> > I am working with census data. My columns of interest are...
> >
> > PercentOld - the percentage of people in each county that are over 65
> > County - the county in each state
> > State - the state in the US
> >
> > There are about 3100 rows, with each row corresponding to a county
> within a state.
> >
> > I want to return the top five "PercentOld" by state. But I want the
> County
> > and the Value.
> >
> > I tried this...
> >
> > topN <- function(column, n=5)
> > {
> > column <- sort(column, decreasing=T)
> > return(column[1:n])
> > }
> > top5PerState <- tapply(data$percentOld, data$STATE, topN)
>
> Try
>
> aggregate(data$PercentOld, list(data$State, data$County), topN)
>
> Regards
> Petr
>
>
> >
> > But this only returns the value for "percentOld" per state, I also
want
> the
> > corresponding County.
> >
> > I think I'm close, but I just can't get it...
> >
> > Thanks
> >
> > cn
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list