# [R] expand.grid

Berwin A Turlach berwin at maths.uwa.edu.au
Wed Jan 19 11:04:09 CET 2011

```G'day Nick,

On Wed, 19 Jan 2011 09:43:56 +0100
"Nick Sabbe" <nick.sabbe at ugent.be> wrote:

> Given a dataframe
>
> dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e",
> "e"), c3=c("g", "h", "i", "j", "k"))
>
> I would like to have a dataframe with all (unique) combinations of
> all the factors present.

Easy:

R> expand.grid(lapply(dfr, levels))
c1 c2 c3
1   a  d  g
2   b  d  g
3   a  e  g
4   b  e  g
5   a  d  h
6   b  d  h
7   a  e  h
8   b  e  h
9   a  d  i
10  b  d  i
11  a  e  i
12  b  e  i
13  a  d  j
14  b  d  j
15  a  e  j
16  b  e  j
17  a  d  k
18  b  d  k
19  a  e  k
20  b  e  k

> In fact, I would like a simple solution for these two cases: given
> the three factor columns above, I would like both all _possible_
> combinations of the factor levels, and all _present_ combinations of
> the factor levels (e.g. if I would do this for the first 4 rows of
> dfr, it would contain no combinations with c3="k").

R> dfrpart <- lapply(dfr[1:4,], factor)
R> expand.grid(lapply(dfrpart, levels))
c1 c2 c3
1   a  d  g
2   b  d  g
3   a  e  g
4   b  e  g
5   a  d  h
6   b  d  h
7   a  e  h
8   b  e  h
9   a  d  i
10  b  d  i
11  a  e  i
12  b  e  i
13  a  d  j
14  b  d  j
15  a  e  j
16  b  e  j

> It would also be nice to be able to choose whether or not NA's are
> included.

R> expand.grid(lapply(dfrpart, function(x) c(levels(x),
+   if(any(is.na(x))) NA else NULL)))
c1   c2 c3
1     a    d  g
2     b    d  g
3  <NA>    d  g
4     a    e  g
5     b    e  g
6  <NA>    e  g
7     a <NA>  g
8     b <NA>  g
9  <NA> <NA>  g
10    a    d  h
11    b    d  h
....

HTH.

Cheers,

Berwin