[R] algorithm that iteratively drops columns of a data-frame
Martin Batholdy
batholdy at googlemail.com
Wed Nov 9 16:36:45 CET 2011
Dear R-Users,
I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion.
So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression).
The problem is that in every round I use a new subset of my data.frame.
However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame).
How can I solve this?
I prepared a small example to make my problem easier to understand:
Here I generate a data.frame containing 6 vectors with different means.
The loop now should exclude the vector with the smallest mean in each round.
At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means.
But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame.
So, in the end I can't use my drop-vector on my original data-frame – since the dimension of the testing data-frame changes in every loop-round.
How can I deal with this kind of problem?
Any suggestions are highly appreciated!
(of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means – It is a pretty generic example)
here is the sample code:
x1 <- rnorm(200, 5, 2)
x2 <- rnorm(200, 6, 2)
x3 <- rnorm(200, 1, 2)
x4 <- rnorm(200, 12, 2)
x5 <- rnorm(200, 8, 2)
x6 <- rnorm(200, 9, 2)
data <- data.frame(x1, x2, x3, x4, x5,x6)
col_means <- colMeans(data)
drop <- match(min(col_means), col_means)
for(i in 1:4) {
col_means <- colMeans(data[,-drop])
drop <- c(drop, match(min(col_means), col_means))
}
More information about the R-help
mailing list