[R] split data, but ensure each level of the factor is represented
Jay
wilcoxjay at gmail.com
Mon Oct 13 19:06:41 CEST 2008
Hello,
I'll use part of the iris dataset for an example of what I want to
do.
> data(iris)
> iris<-iris[1:10,1:4]
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
Now if I want to split this data using the vector
> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
> a
[1] 3 3 3 2 3 1 2 3 2 3
Then the function split works fine
> split(iris,a)
$`1`
Sepal.Length Sepal.Width Petal.Length Petal.Width
6 5.4 3.9 1.7 0.4
$`2`
Sepal.Length Sepal.Width Petal.Length Petal.Width
4 4.6 3.1 1.5 0.2
7 4.6 3.4 1.4 0.3
9 4.4 2.9 1.4 0.2
$`3`
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
5 5.0 3.6 1.4 0.2
8 5.0 3.4 1.5 0.2
10 4.9 3.1 1.5 0.1
My problem is when the vector lacks one of the values from 1:n. For
example if the vector is
> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
> a
[1] 3 3 3 2 3 2 2 3 2 3
then split will return a list without a $`1`. I would like to have the
$`1` be a vector of 0's with the same length as the number of columns
in the dataset. In other words I want to write a function that returns
> mysplit(iris,a)
$`1`
[1] 0 0 0 0 0
$`2`
Sepal.Length Sepal.Width Petal.Length Petal.Width
4 4.6 3.1 1.5 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
9 4.4 2.9 1.4 0.2
$`3`
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
5 5.0 3.6 1.4 0.2
8 5.0 3.4 1.5 0.2
10 4.9 3.1 1.5 0.1
Thank you for your time,
Jay
More information about the R-help
mailing list