[R] algorithm that iteratively drops columns of a data-frame
Martin Batholdy
batholdy at googlemail.com
Thu Nov 10 01:41:19 CET 2011
great, thank you both!
On 09.11.2011, at 17:27, Jeff Newmiller wrote:
> Try
>
> data[,!names(data) %in% names(col_means)]
>
> On Wed, 9 Nov 2011, Martin Batholdy wrote:
>
>> Dear R-Users,
>>
>>
>> I have a problem with an algorithm that iteratively goes over a data.frame and exclude n-columns each step based on a statistical criterion.
>> So that the 'column-space' gets smaller and smaller with each iteration (like when you do stepwise regression).
>>
>> The problem is that in every round I use a new subset of my data.frame.
>>
>> However, as soon as I "generate" this subset by indexing the data.frame I get of course different column-numbers (compared to my original data-frame).
>>
>> How can I solve this?
>>
>>
>>
>> I prepared a small example to make my problem easier to understand:
>>
>>
>> Here I generate a data.frame containing 6 vectors with different means.
>>
>> The loop now should exclude the vector with the smallest mean in each round.
>>
>> At the end I want to have a vector ('drop') which contains the column numbers that I can apply on the original data.frame to get a subset with the highest means.
>>
>> But the problem is that this is not working, since every time I generate a subset ('data[,-drop]') I of course get now different column-numbers that differ from the column-numbers of the original data-frame.
>>
>> So, in the end I can't use my drop-vector on my original data-frame ? since the dimension of the testing data-frame changes in every loop-round.
>>
>>
>> How can I deal with this kind of problem?
>>
>> Any suggestions are highly appreciated!
>> (of course for the example code, there are much easier method to achieve the goal of finding the columns with the smallest means ? It is a pretty generic example)
>>
>>
>> here is the sample code:
>>
>>
>> x1 <- rnorm(200, 5, 2)
>> x2 <- rnorm(200, 6, 2)
>> x3 <- rnorm(200, 1, 2)
>> x4 <- rnorm(200, 12, 2)
>> x5 <- rnorm(200, 8, 2)
>> x6 <- rnorm(200, 9, 2)
>>
>>
>> data <- data.frame(x1, x2, x3, x4, x5,x6)
>>
>> col_means <- colMeans(data)
>> drop <- match(min(col_means), col_means)
>>
>>
>> for(i in 1:4) {
>>
>> col_means <- colMeans(data[,-drop])
>> drop <- c(drop, match(min(col_means), col_means))
>>
>> }
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list