[R] Referencing variable names rather than column numbers
David Winsemius
dwinsemius at comcast.net
Sat Dec 5 18:07:21 CET 2009
On Dec 5, 2009, at 11:30 AM, baptiste auguie wrote:
> Hi,
>
> Try this,
>
> cor(pollute[ ,c("Pollution","Temp","Industry")])
>
> and ?"[" in particular,
> "Character vectors will be matched to the names of the object "
John-Paul;
In the time it took me to compose this, I see that others have already
pointed out all of what I had written so it only remains to offer yet-
another-R-method for ranges of column names.
You could have defined a "targets" vector of names if you know the
starting and ending position:
?Extract # or equivalently ?"["
targets <- names(pollute)[1:3] # colnames is an equivalent function
for dataframe objects
targets
pollute[ , targets]
--
Best;
David.
>
> HTH,
>
> baptiste
>
> 2009/12/5 John-Paul Ferguson <ferguson_john-paul at gsb.stanford.edu>:
>> I apologize for how basic a question this is. I am a Stata user who
>> has begun using R, and the syntax differences still trip me up. The
>> most basic questions, involving as they do general terms, can be the
>> hardest to find solutions for through search.
>>
>> Assume for the moment that I have a dataset that contains seven
>> variables: Pollution, Temp, Industry, Population, Wind, Rain and
>> Wet.days. (This actual dataset is taken from Michael Crawley's
>> "Statistics: An Introduction Using R" and is available as
>> "pollute.txt" in
>> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
>> Assume I have attached pollute. Then
>>
>> cor(pollute)
>>
>> will give me the correlation table for these seven variables. If I
>> would prefer only to see the correlations between, say, Pollution,
>> Temp and Industry, I can get that with
>>
>> cor(pollute[,1:3])
>>
>> or with
>>
>> cor(pollute[1:3])
>>
>> Similarly, I can see the correlations between Temp, Population and
>> Rain with
>>
>> cor(pollute[,c(2,4,6)])
>>
>> or with
>>
>> cor(pollute[c(2,4,6)])
>>
>> This is fine for a seven-variable dataset. When I have 250 variables,
>> though, I start to pale at looking up column indexes over and over. I
>> know from reading the list archives that I can extract the column
>> index of Industry, for example, by typing
>>
>> which("Industry"==names(pollute))
>>
>> but doing that before each command seems dire. Trained to using Stata
>> as I am, I am inclined to check the correlation of the first three or
>> the second, fourth and sixth columns by substituting the column names
>> for the column indexes--something like the following:
>>
>> cor(pollute[Pollution:Industry])
>> cor(pollute[c(Temp,Population,Rain)])
>>
>> These however throw errors.
>>
>> I know that many commands in R are perfectly happy to take variable
>> names--the regression models, for example--but that some do not. And
>> so I ask you two general questions:
>>
>> 1. Is there a syntax for referring to variable names rather than
>> column indexes in situations like these?
>> 2. Is there something that I should look for in a command's help file
>> that often indicates whether it can take column names rather than
>> indexes?
>>
>> Again, apologies for asking something that has likely been asked
>> before. I would appreciate any suggestions that you have.
>>
>> Best,
>> John-Paul Ferguson
>> Assistant Professor of Organizational Behavior
>> Stanford University Graduate School of Business
>> 518 Memorial Way, K313
>> Stanford, CA 94305
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list