[R] splitting dataset based on variable and re-combining

Brian Feeny bfeeny at mac.com
Mon Dec 10 23:41:05 CET 2012


I have a dataset and I wish to use two different models to predict.  Both models are SVM.  The reason for two different models is based
on the sex of the observation.  I wish to be able to make predictions and have the results be in the same order as my original dataset.  To
illustrate I will use iris:

# Take Iris and create a dataframe of just two Species, setosa and versicolor, shuffle them
data(iris)
iris <- iris[(iris$Species=="setosa" | iris$Species=="versicolor"),]
irisindex <- sample(1:nrow(iris), nrow(iris))
iris <- iris[irisindex,]

# Make predictions on setosa using the mySetosaModel model, and on versicolor using the myVersicolorModel:

predict(mySetosaModel, iris[iris$Species=="setosa",])
predict(myVersicolorModel, iris[iris$Species=="versicolor",])

The problem is this will give me a vector of just the setosa results, and then one of just the versicolor results.

I wish to take the results and have them be in the same order as the original dataset.  So if the original dataset had:


Species
setosa
setosa
versicolor
setosa
versicolor
setosa

I wish for my results to have:
<prediction for setosa>
<prediction for setosa>
<prediction for versicolor>
<prediction for setosa>
<prediction for versicolor>
<prediction for setosa>

But instead, what I am ending up with is two result sets, and no way I can think of to combine them.  I am sure this comes up alot where you have a factor you wish to split your models on, say sex (male vs. female), and you need to present the results back so it matches to the order of the orignal dataset.

I have tried to think of ways to use an index, to try to keep things in order, but I can't figure it out.

Any help is greatly appreciated.

Brian



More information about the R-help mailing list