# [R] Questions about Probit Analysis

Lorenzo Isella lorenzo.isella at gmail.com
Sun Oct 31 20:14:00 CET 2010

```Dear All,
I have some questions about probit regressions.
I saw a nice introduction at

http://bit.ly/bU9xL5

and I mainly have two questions.

(1) The first is almost about data manipulation. Consider the following
snippet

##################################################

names(mydata) <- c("outcome","x1","x2","x3")

myprobit <- glm(mydata\$outcome~mydata\$x1+mydata\$x2+as.factor(mydata\$x3),

print(summary(myprobit))

#Now assume I can make a regression only on x1

print(summary(myprobit2))

#express in terms of counts

md <- t(table(mydata\$outcome, mydata\$x1))

# create new dataframe

mydatanew <- data.frame(as.numeric(row.names(md)))

names(mydatanew) <- c("x1")

mydatanew\$successes <-as.numeric(md[ ,2])

mydatanew\$failures <-as.numeric(md[ ,1])

########################################################################

where first I carry out a logit regression of the binary outcome (i.e.
taking only 0/1 as values) on 3 regressors, then I simply regress the
outcome on the x1 variable.

Finally, I generate the data frame mydatanew (see some of its entries below)

> mydatanew
x1 successes failures
1  220         0        1
2  300         1        2
3  340         1        3
4  360         0        4
5  380         0        8
[...................]

where for every value of x1 I count the number of 0 and 1 outcomes
(namely number of failures and number of successes). This is equivalent
to having a full list of x1 values with an associated 0/1 outcome (I
have simply counted them) hence it is all the info I need to again
perform a logit regression of the binary outcome on x1, but the data
format is now different. How can I actually feed R with mydatanew to
perform again a logistic regression on x1 only?
(2) This is a bit more conceptual. Let us say that you have a set of
products A,B,C,D,E,F. Each product has a list of features: x_A for
product A, x_B for B etc...
Each customer has its own set of parameters (age, sex, income etc..) I
call x_cust. Finally, the customer is confronted with two products (e.g.
A and D; combinations may vary, I call each combination of two products
data are in the format

1 x_A x_cust
0 x_D x_cust

meaning that a certain customer chose product A against product D; similarly

1 x_C x_cust
0 x_B x_cust

would mean that the customer choosing between C and B finally selected
C.  Every customer needs to choose a product in a variety of different
scenarios.  How would you analyze this kind of data? Is there any way I
can express, in my probit analysis, the fact that my binary outcome (but
this product or not) arises always from the comparison of two products
only (customers are never given a choice between more than two products
in a given scenario). Or should I simply run my logistic regression on
my 0/1 outcome without any extra worry (like in the snippet above)?
Many thanks

Lorenzo

```