[R] split data, but ensure each level of the factor is represented

Jay wilcoxjay at gmail.com
Mon Oct 13 19:06:41 CEST 2008


Hello,

I'll use part of the iris dataset for an example of what I want to
do.

> data(iris)
> iris<-iris[1:10,1:4]
> iris
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1           5.1         3.5          1.4         0.2
2           4.9         3.0          1.4         0.2
3           4.7         3.2          1.3         0.2
4           4.6         3.1          1.5         0.2
5           5.0         3.6          1.4         0.2
6           5.4         3.9          1.7         0.4
7           4.6         3.4          1.4         0.3
8           5.0         3.4          1.5         0.2
9           4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1

Now if I want to split this data using the vector
> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
> a
 [1] 3 3 3 2 3 1 2 3 2 3

Then the function split works fine
> split(iris,a)
$`1`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
6          5.4         3.9          1.7         0.4

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4          4.6         3.1          1.5         0.2
7          4.6         3.4          1.4         0.3
9          4.4         2.9          1.4         0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1           5.1         3.5          1.4         0.2
2           4.9         3.0          1.4         0.2
3           4.7         3.2          1.3         0.2
5           5.0         3.6          1.4         0.2
8           5.0         3.4          1.5         0.2
10          4.9         3.1          1.5         0.1


My problem is when the vector lacks one of the values from 1:n. For
example if the vector is
> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
> a
 [1] 3 3 3 2 3 2 2 3 2 3

then split will return a list without a $`1`. I would like to have the
$`1` be a vector of 0's with the same length as the number of columns
in the dataset. In other words I want to write a function that returns

> mysplit(iris,a)
$`1`
[1] 0 0 0 0 0

$`2`
  Sepal.Length Sepal.Width Petal.Length Petal.Width
4          4.6         3.1          1.5         0.2
6          5.4         3.9          1.7         0.4
7          4.6         3.4          1.4         0.3
9          4.4         2.9          1.4         0.2

$`3`
   Sepal.Length Sepal.Width Petal.Length Petal.Width
1           5.1         3.5          1.4         0.2
2           4.9         3.0          1.4         0.2
3           4.7         3.2          1.3         0.2
5           5.0         3.6          1.4         0.2
8           5.0         3.4          1.5         0.2
10          4.9         3.1          1.5         0.1

Thank you for your time,

Jay



More information about the R-help mailing list