[R] Referencing variable names rather than column numbers

Sat Dec 5 18:00:45 CET 2009

Alternatively, you can use subset(), which supports the ":" operator  
for the 'select' argument:

 > cor(subset(iris, select = Sepal.Length:Petal.Length))
              Sepal.Length Sepal.Width Petal.Length
Sepal.Length    1.0000000  -0.1175698    0.8717538
Sepal.Width    -0.1175698   1.0000000   -0.4284401
Petal.Length    0.8717538  -0.4284401    1.0000000

which is equivalent to:

 > cor(iris[, 1:3])
              Sepal.Length Sepal.Width Petal.Length
Sepal.Length    1.0000000  -0.1175698    0.8717538
Sepal.Width    -0.1175698   1.0000000   -0.4284401
Petal.Length    0.8717538  -0.4284401    1.0000000

So for the pollute data:

   cor(subset(pollute, select = Pollution:Industry))

should work.

Note also that the 'select' argument to subset can take non-contiguous  
column names:

# Skip 'Sepal.Width'
 > cor(subset(iris, select = c(Sepal.Length, Petal.Length:Petal.Width)))
              Sepal.Length Petal.Length Petal.Width
Sepal.Length    1.0000000    0.8717538   0.8179411
Petal.Length    0.8717538    1.0000000   0.9628654
Petal.Width     0.8179411    0.9628654   1.0000000

So you have the option of specifying, by name, multiple series of  
contiguous and non-contiguous column names.

See ?subset

HTH,

Marc Schwartz

On Dec 5, 2009, at 10:43 AM, Ista Zahn wrote:

> As baptiste noted, you can do
>
> cor(pollute[ ,c("Pollution","Temp","Industry")]).
>
> But
>
> cor(pollute[,"Pollution":"Industry"])
>
> will not work. For that you can do
>
> cor 
> (pollute 
> [ ,which 
> (names(pollute)=="Pollution"):which(names(pollute)=="Industry")])
>
> -Ista
>
> On Sat, Dec 5, 2009 at 11:22 AM, John-Paul Ferguson
> <ferguson_john-paul at gsb.stanford.edu> wrote:
>> I apologize for how basic a question this is. I am a Stata user who
>> has begun using R, and the syntax differences still trip me up. The
>> most basic questions, involving as they do general terms, can be the
>> hardest to find solutions for through search.
>>
>> Assume for the moment that I have a dataset that contains seven
>> variables: Pollution, Temp, Industry, Population, Wind, Rain and
>> Wet.days. (This actual dataset is taken from Michael Crawley's
>> "Statistics: An Introduction Using R" and is available as
>> "pollute.txt" in
>> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
>> Assume I have attached pollute. Then
>>
>> cor(pollute)
>>
>> will give me the correlation table for these seven variables. If I
>> would prefer only to see the correlations between, say, Pollution,
>> Temp and Industry, I can get that with
>>
>> cor(pollute[,1:3])
>>
>> or with
>>
>> cor(pollute[1:3])
>>
>> Similarly, I can see the correlations between Temp, Population and  
>> Rain with
>>
>> cor(pollute[,c(2,4,6)])
>>
>> or with
>>
>> cor(pollute[c(2,4,6)])
>>
>> This is fine for a seven-variable dataset. When I have 250 variables,
>> though, I start to pale at looking up column indexes over and over. I
>> know from reading the list archives that I can extract the column
>> index of Industry, for example, by typing
>>
>> which("Industry"==names(pollute))
>>
>> but doing that before each command seems dire. Trained to using Stata
>> as I am, I am inclined to check the correlation of the first three or
>> the second, fourth and sixth columns by substituting the column names
>> for the column indexes--something like the following:
>>
>> cor(pollute[Pollution:Industry])
>> cor(pollute[c(Temp,Population,Rain)])
>>
>> These however throw errors.
>>
>> I know that many commands in R are perfectly happy to take variable
>> names--the regression models, for example--but that some do not. And
>> so I ask you two general questions:
>>
>> 1. Is there a syntax for referring to variable names rather than
>> column indexes in situations like these?
>> 2. Is there something that I should look for in a command's help file
>> that often indicates whether it can take column names rather than
>> indexes?
>>
>> Again, apologies for asking something that has likely been asked
>> before. I would appreciate any suggestions that you have.
>>
>> Best,
>> John-Paul Ferguson
>> Assistant Professor of Organizational Behavior
>> Stanford University Graduate School of Business
>> 518 Memorial Way, K313
>> Stanford, CA 94305
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.