[R] choosing multiple columns

David Winsemius dwinsemius at comcast.net
Sun Aug 12 19:37:15 CEST 2012


On Aug 11, 2012, at 6:01 AM, Ista Zahn wrote:

> On Sat, Aug 11, 2012 at 8:51 AM, Sachinthaka Abeywardana
> <sachin.abeywardana at gmail.com> wrote:
>> I should have mentioned that I do not know the number index of the  
>> columns,
>> but regardless, thanks for the responses
>
> Right, so use my first method. This does not depend on the position of
> the columns.

I would counsel greater consideration of the possible ranges of the  
column names. Even using a variation on Ista Zahn's method intended to  
deliver on the first 8 will fail if the range of possible values is  
greater than 10 in number or the numbers do not start from 1.

If the numbers of the columns do start from 1, you could try this

grep("^OFB[1-8]", paste0("OFB", 1:100) , value=TRUE )[1:8]

Otherwise  consider these efforts;

 > set.seed(123); test <- sample( paste0("OFB", 1:100), 20)
 > sort(test)[1:8]
[1] "OFB21" "OFB27" "OFB29" "OFB4"  "OFB41" "OFB42" "OFB5"  "OFB50

 > grep("^OFB[1-8]", test , value=TRUE )[1:8]
[1] "OFB29" "OFB79" "OFB41" "OFB86" "OFB5"  "OFB50" "OFB83" "OFB51"


Note that even this does not get what you want which is =

 > test[order(as.numeric( sub("OFB", "", test)))][1:8]
[1] "OFB4"  "OFB5"  "OFB9"  "OFB21" "OFB27" "OFB29" "OFB41" "OFB42"

There is also a function named mixedsort in Greg Warnes package gtools  
which automatically splits the alpha and numeric components of of an  
alphanumeric vector and then orders by the two of them separately.

Something like this might achieve:

 > test[ order( sub("[0-9]+","", test),   # an alpha sort .. followed  
by numeric sort
                as.numeric(gsub("[[:alpha:]]*([[:digit:]]*)", '\\1',  
test) ) )]

  [1] "OFB4"  "OFB5"  "OFB9"  "OFB21" "OFB27" "OFB29" "OFB41" "OFB42"  
"OFB50" "OFB51" "OFB60" "OFB77" "OFB78"
[14] "OFB79" "OFB83" "OFB86" "OFB87" "OFB91" "OFB94" "OFB98"


gtools::ixedsort is based on gtools::mixedorder and has more  
sophistication, for instance the attempt to identify spaces and  
delimiters.

-- 
David.
>
> Best,
> Ista
>
>>
>>
>> On Sat, Aug 11, 2012 at 10:46 PM, Ista Zahn <istazahn at gmail.com>  
>> wrote:
>>>
>>> Hi Sachin,
>>>
>>> There are at least two ways. The safer way is to use a regular
>>> expression to find the matching columns, like this:
>>>
>>> a <- initial_data[grep("^OFB[0-9]+", names(initial_data))]
>>>
>>> Alternatively, if you know that the columns you want are the first 8
>>> you can select them by position, like this:
>>>
>>> a <- initial_data[1:8]
>>>
>>> Best,
>>> Ista
>>>
>>> On Sat, Aug 11, 2012 at 7:59 AM, Sachinthaka Abeywardana
>>> <sachin.abeywardana at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I have a data frame that has the columns OFB1, OFB2, OFB3,...  
>>>> OFB10.
>>>>
>>>> How do I select the first 8 columns efficiently without typing  
>>>> each and
>>>> every one of them. i.e. I want something like:
>>>>
>>>> a<-data.frame(initial_data$OFB1-10) #i know this is wrong, what  
>>>> would be
>>>> the correct syntax?
>>>>
>>>> Thanks,
>>>> Sachin


David Winsemius, MD
Alameda, CA, USA



More information about the R-help mailing list