[R] algorithm that iteratively drops columns of a data-frame

R. Michael Weylandt michael.weylandt at gmail.com
Wed Nov 9 16:47:42 CET 2011


Perhaps attach placeholder names to your columns and use those rather
than indices?

Michael

On Wed, Nov 9, 2011 at 10:36 AM, Martin Batholdy
<batholdy at googlemail.com> wrote:
> Dear R-Users,
>
>
> I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion.
> So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression).
>
> The problem is that in every round I use a new subset of my data.frame.
>
> However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame).
>
> How can I solve this?
>
>
>
> I prepared a small example to make my problem easier to understand:
>
>
> Here I generate a data.frame containing 6 vectors with different means.
>
> The loop now should exclude the vector with the smallest mean in each round.
>
> At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means.
>
> But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame.
>
> So, in the end I can't use my drop-vector on my original data-frame – since the dimension of the testing data-frame changes in every loop-round.
>
>
> How can I deal with this kind of problem?
>
> Any suggestions are highly appreciated!
> (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means – It is a pretty generic example)
>
>
> here is the sample code:
>
>
> x1 <- rnorm(200, 5, 2)
> x2 <- rnorm(200, 6, 2)
> x3 <- rnorm(200, 1, 2)
> x4 <- rnorm(200, 12, 2)
> x5 <- rnorm(200, 8, 2)
> x6 <- rnorm(200, 9, 2)
>
>
> data <- data.frame(x1, x2, x3, x4, x5,x6)
>
> col_means <- colMeans(data)
> drop <- match(min(col_means), col_means)
>
>
> for(i in 1:4) {
>
>        col_means <- colMeans(data[,-drop])
>        drop <- c(drop, match(min(col_means), col_means))
>
> }
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list