[R] algorithm that iteratively drops columns of a data-frame

Martin Batholdy batholdy at googlemail.com
Thu Nov 10 01:41:19 CET 2011


great, thank you both!



On 09.11.2011, at 17:27, Jeff Newmiller wrote:

> Try
> 
> data[,!names(data) %in% names(col_means)]
> 
> On Wed, 9 Nov 2011, Martin Batholdy wrote:
> 
>> Dear R-Users,
>> 
>> 
>> I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion.
>> So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression).
>> 
>> The problem is that in every round I use a new subset of my data.frame.
>> 
>> However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame).
>> 
>> How can I solve this?
>> 
>> 
>> 
>> I prepared a small example to make my problem easier to understand:
>> 
>> 
>> Here I generate a data.frame containing 6 vectors with different means.
>> 
>> The loop now should exclude the vector with the smallest mean in each round.
>> 
>> At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means.
>> 
>> But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame.
>> 
>> So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round.
>> 
>> 
>> How can I deal with this kind of problem?
>> 
>> Any suggestions are highly appreciated!
>> (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example)
>> 
>> 
>> here is the sample code:
>> 
>> 
>> x1 <- rnorm(200, 5, 2)
>> x2 <- rnorm(200, 6, 2)
>> x3 <- rnorm(200, 1, 2)
>> x4 <- rnorm(200, 12, 2)
>> x5 <- rnorm(200, 8, 2)
>> x6 <- rnorm(200, 9, 2)
>> 
>> 
>> data <- data.frame(x1, x2, x3, x4, x5,x6)
>> 
>> col_means <- colMeans(data)
>> drop <- match(min(col_means), col_means)
>> 
>> 
>> for(i in 1:4) {
>> 
>> 	col_means <- colMeans(data[,-drop])
>> 	drop <- c(drop, match(min(col_means), col_means))
>> 
>> }
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------



More information about the R-help mailing list