[R] Choosing columns by number
Marc Schwartz
marc_schwartz at me.com
Tue Aug 25 17:29:23 CEST 2015
> On Aug 25, 2015, at 10:17 AM, Sam Albers <tonightsthenight at gmail.com> wrote:
>
> Hi all,
>
> This is a process question. How do folks efficiently identify column
> numbers in a dataframe without manually counting them. For example, if I
> want to choose columns from the iris dataframe I know of two options. I can
> do this:
>
>> str(iris)'data.frame': 150 obs. of 5 variables:
> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1
> 1 1 1 1 1 1 ...
>
> or this:
>
>> names(iris)[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
>
> Neither option explicitly identifies the column number so that I can
> do something like this:
>
> iris[,c(2,4)]
>
> I feel like there must be a better way to do this so I wanted to ask
> the collective wisdom here what people do to accomplish this.
> Obviously this is a trivial example, but the issue really becomes
> problematic when you have a large dataframe.
>
> Thanks in advance!
>
> Sam
Just use ?subset:
NewDF <- subset(iris, select = c(Sepal.Width, Petal.Width))
which is the same as:
NewDF <- iris[, c(2, 4)]
You can also define sequential columns using “:”, thus:
NewDF <- subset(iris, select = c(Sepal.Width:Petal.Width)
is the same as:
NewDF <- iris[, 2:4]
and use combinations of the two approaches as well.
You can also negate the selection by using:
select = -c(…)
That avoids having to worry about using integer indices.
Regards,
Marc Schwartz
More information about the R-help
mailing list