[R] algorithm that iteratively drops columns of a data-frame

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Nov 9 17:27:12 CET 2011


Try

data[,!names(data) %in% names(col_means)]

On Wed, 9 Nov 2011, Martin Batholdy wrote:

> Dear R-Users,
>
>
> I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion.
> So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression).
>
> The problem is that in every round I use a new subset of my data.frame.
>
> However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame).
>
> How can I solve this?
>
>
>
> I prepared a small example to make my problem easier to understand:
>
>
> Here I generate a data.frame containing 6 vectors with different means.
>
> The loop now should exclude the vector with the smallest mean in each round.
>
> At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means.
>
> But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame.
>
> So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round.
>
>
> How can I deal with this kind of problem?
>
> Any suggestions are highly appreciated!
> (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example)
>
>
> here is the sample code:
>
>
> x1 <- rnorm(200, 5, 2)
> x2 <- rnorm(200, 6, 2)
> x3 <- rnorm(200, 1, 2)
> x4 <- rnorm(200, 12, 2)
> x5 <- rnorm(200, 8, 2)
> x6 <- rnorm(200, 9, 2)
>
>
> data <- data.frame(x1, x2, x3, x4, x5,x6)
>
> col_means <- colMeans(data)
> drop <- match(min(col_means), col_means)
>
>
> for(i in 1:4) {
>
> 	col_means <- colMeans(data[,-drop])
> 	drop <- c(drop, match(min(col_means), col_means))
>
> }
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list