# [R] select a subset from a sample

Ista Zahn izahn at psych.rochester.edu
Sun Jan 23 17:12:17 CET 2011

```I think there are multiple solutions that match your criteria. Here is one:

dat <- structure(list(Id = 1:20, v1 = c(1L, 2L, 4L, 1L, 3L, 3L, 3L,
+ 4L, 1L, 4L, 2L, 1L, 2L, 4L, 3L, 2L, 1L, 2L, 4L, 3L), v2 = c(2L,
+ 1L, 2L, 1L, 2L, 1L, 4L, 4L, 2L, 1L, 4L, 4L, 3L, 3L, 2L, 3L, 4L,
+ 3L, 1L, 3L), v3 = c(4L, 3L, 4L, 2L, 3L, 1L, 3L, 4L, 2L, 1L, 3L,
+ 2L, 3L, 1L, 1L, 2L, 1L, 4L, 4L, 2L), v4 = c(3L, 4L, 2L, 3L, 4L,
+ 1L, 1L, 4L, 1L, 2L, NA, 3L, 4L, NA, 2L, 3L, 4L, 3L, 1L, 1L)), .Names
= c("Id",
+ "v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(NA,
+ -20L))
> keep <- rowSums(apply(dat[,-1], 2, function(x) !duplicated(x)))
> dat.sub <- dat[keep > 0 ,]

Best,
Ista

On Sun, Jan 23, 2011 at 12:43 PM, Wei Yang <peterwyang1 at gmail.com> wrote:
> Dear all,
>
> I would like to ask whether anyone has experience with the problem below.
>
>
> I want to select a subset of the sample (see data below) so that each level
> (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is
> shown at least once in the subset.  I also want the sample size of the
> subset to be as small as possible.  Any help on it is greatly appreciated.
>
>
>    Id v1 v2 v3 v4
>
> [1,]  1 1 2 4 3
>
>  [2,]  2 2 1 3 4
>
>  [3,]  3 4 2 4 2
>
>  [4,]  4 1 1 2 3
>
>  [5,]  5 3 2 3 4
>
>  [6,]  6 3 1 1 1
>
>  [7,]  7 3 4 3 1
>
>  [8,]  8 4 4 4 4
>
>  [9,]  9 1 2 2 1
>
> [10,] 10 4 1 1 2
>
> [11,] 11 2 4 3 2
>
> [12,] 12 1 4 2 3
>
> [13,] 13 2 3 3 4
>
> [14,] 14 4 3 1 2
>
> [15,] 15 3 2 1 2
>
> [16,] 16 2 3 2 3
>
> [17,] 17 1 4 1 4
>
> [18,] 18 2 3 4 3
>
> [19,] 19 4 1 4 1
>
> [20,] 20 3 3 2 1
>
>
>
> Thanks,
>
> Peter
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

--
Ista Zahn