[R] Dummy (factor) based on a pair of variables
David Winsemius
dwinsemius at comcast.net
Sun Apr 19 00:06:46 CEST 2009
> df <- read.table(textConnection("y,i,j
+ 1,AUT,BEL
+ 2,AUT,GER
+ 3,BEL,GER"), header=T,sep=",", as.is=T)
> df
y i j
1 1 AUT BEL
2 2 AUT GER
3 3 BEL GER
> countries <- unique(c(df$i,df$j))
> countries
[1] "AUT" "BEL" "GER"
> df[countries] <- sapply(countries, function(x) df[x] <<- df$i == x
| df$j == x)
> df
y i j AUT BEL GER
1 1 AUT BEL TRUE TRUE FALSE
2 2 AUT GER TRUE FALSE TRUE
3 3 BEL GER FALSE TRUE TRUE
Obviously it would not be possible to test this arrangement with lm.
So I tried scaling it up and testing on:
dft <- data.frame(y=rnorm(100), i = sample(countries, 100,
replace=T), j= sample(countries, 100, replace=T))
#Removed all the duplicates with:
dft <- dft(dft$i != dft$j, ]
#and it did not give proper answers.
This seems to give correct answers
dft[countries] <- sapply(countries, function(y) apply(dft, 1,
function(x) x[2] == y | x[3] == y))
And application of those variables is handles in a reasonable manner
by the R formula parser:
> lm(y ~ AUT + BEL + GER, data=dft)
Call:
lm(formula = y ~ AUT + BEL + GER, data = dft)
Coefficients:
(Intercept) AUTTRUE BELTRUE GERTRUE
0.09192 0.15130 -0.29274 NA
-
David Winsemius
On Apr 18, 2009, at 4:09 PM, Jason Morgan wrote:
> On 2009.04.18 15:58:30, Jason Morgan wrote:
>> On 2009.04.18 13:52:35, Serguei Kaniovski wrote:
>>> I can generate the above dummies but can this design be imputed in a
>>> reg. model directly?
>
> Oops, I apologize for not reading the whole question. Can you do the
> following:
>
> lm(y ~ I(ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)) +
> I(ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)) +
> I(ifelse(df$i=="GER"|df$j=="GER", 1, 0)), data=df)
>
> If you exclude the ifelse(), you will get a vector of TRUE/FALSE,
> which may or may not work.
>
> ~Jason
>
>> Hello Serguei,
>>
>> I am sure there is a better way to do this, but the following seems
>> to
>> work:
>>
>> # Create sample data.frame()
>> i <- c("AUT", "AUT", "BEL")
>> j <- c("BEL", "GER", "GER")
>> df <- data.frame(i=i, j=j)
>>
>> # Create dummy vectors
>> df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)
>> df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)
>> df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0)
>>
>> # Print results
>> df
>>
>> HTH,
>>
>> ~Jason
>>
>>
>
> --
> Jason W. Morgan
> Graduate Student, Political Science
> *The Ohio State University*
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list