[R] Subsetting by number of observations in a factor
jim holtman
jholtman at gmail.com
Fri Aug 10 04:35:22 CEST 2007
Does this do what you want? It creates a new dataframe with those
'mg' that have at least a certain number of observation.
> set.seed(2)
> # create some test data
> x <- data.frame(mg=sample(LETTERS[1:4], 20, TRUE), data=1:20)
> # split the data into subsets based on 'mg'
> x.split <- split(x, x$mg)
> str(x.split)
List of 4
$ A:'data.frame': 7 obs. of 2 variables:
..$ mg : Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1
..$ data: int [1:7] 1 4 7 12 14 18 20
$ B:'data.frame': 3 obs. of 2 variables:
..$ mg : Factor w/ 4 levels "A","B","C","D": 2 2 2
..$ data: int [1:3] 9 15 19
$ C:'data.frame': 4 obs. of 2 variables:
..$ mg : Factor w/ 4 levels "A","B","C","D": 3 3 3 3
..$ data: int [1:4] 2 3 10 11
$ D:'data.frame': 6 obs. of 2 variables:
..$ mg : Factor w/ 4 levels "A","B","C","D": 4 4 4 4 4 4
..$ data: int [1:6] 5 6 8 13 16 17
> # only choose subsets with at 5 observations
> x.5 <- lapply(x.split, function(a) {
+ if (nrow(a) >= 5) return(a)
+ else return(NULL)
+ })
> # create new dataframe with these observations
> x.new <- do.call('rbind', x.5)
> x.new
mg data
A.1 A 1
A.4 A 4
A.7 A 7
A.12 A 12
A.14 A 14
A.18 A 18
A.20 A 20
D.5 D 5
D.6 D 6
D.8 D 8
D.13 D 13
D.16 D 16
D.17 D 17
>
>
On 8/9/07, Ron Crump <ron.crump at une.edu.au> wrote:
> Hi,
>
> I generally do my data preparation externally to R, so I
> this is a bit unfamiliar to me, but a colleague has asked
> me how to do certain data manipulations within R.
>
> Anyway, basically I can get his large file into a dataframe.
> One of the columns is a management group code (mg). There may be
> varying numbers of observations per management group, and
> he would like to subset the dataframe such that there are
> always at least n per management group.
>
> I presume I can get to this using table or tapply, then
> (and I'm not sure how on this bit) creating a column nmg
> containing the number of observations that corresponds to
> mg for that row, then simply subsetting.
>
> So, am I on the right track? If so how do I actually do it, and
> is there an easier method than I am considering.
>
> Thanks for your help,
> Ron
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list