[R] Logistic regression problem: propensity score matching
Paul
paul_bivand at blueyonder.co.uk
Fri Jun 6 01:22:25 CEST 2003
Thank you all.
I made a pretty basic error in using multinom rather than glm
family=binomial which needed rapid correction.
I have now rewritten the relevant part using glm.
After importing I convert all categorical variables into factors
londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig",
rownames=TRUE)
attach(londonpsm)
factor(londonpsm$InSample)
factor(londonpsm$GENDER)
factor(londonpsm$DISABLED)
factor(londonpsm$ETHCODE)
factor(londonpsm$LOPTYPE)
LonOutput <- glm(InSample ~ AGE + DISABLED + GENDER + ETHCODE + NDYPTOT
+ NDLTUTOT + LOPTYPE, family = binomial)
lonoutput <- data.frame(fitted.values(LonOutput))
sqlSave(channel, lonoutput, tablename="lonoutput", safer=FALSE)
From the comments, this looks better, but it may be there is some
further switch I should use.
Apologies for the variables in capitals - my data comes in SPSS format
but to manipulate it I use Access, and the only way I can see to get
data from SPSS to Access is to export it in a format such as dbase,
which capitalises all variables.
While sqlFetch, sqlQuery and sqlSave seem to work amazingly well, and
fast, I am still having a problem with my rownames. I would like the
imported data to have the database unique ID as the rownames, and
protect these through the analysis, so that the two columns in
fitted.values are unique ID and the fitted value. So far this does not
work.
Then, once the result has been sqlSaved, the inclusion of the unique ID
enables matching of the resulting action and control sample with
personal details for fieldwork, after the closest control match to
action sample has been identified.
John Fox wrote:
> Dear Paul,
>
>
> At 08:41 PM 6/4/2003 +0100, Paul wrote:
>
>> Thanks for your reply.
>>
>> I am using logistic regression because my response variable is
>> categorical - and this seems to be recommended in the literature (by
>> Heckman, Smith and others).
>
>
> I think that Prof. Ripley's point here is that although one can use
> multnom in the nnet package to fit a binary (or binomial) logistic
> regression, it is more common to do so using the glm (generlized
> linear model) function. One normally would use multinomial logistic
> regression only for a polytomous (several-category) response variable.
> Applied to a dichotomous response, it will give the same results as a
> binary logistic regression.
>
>> . . .
>>
>> I have MASS but was unable to locate logistic regression, which I was
>> advised was the standard method for my problem.
>
>
> In MASS (4th edition), logit models are discussed in chapter 7 on
> generalized linear models (see, in particular, section 7.2). In my R
> and S-PLUS Companion, to which you referred in your original message,
> these models are discussed in chapter 5 on generalized linear models
> (see, in particular, section 5.2.1).
>
> I hope that this helps,
> John
>
>> Thanks again.
>>
>> Prof Brian Ripley wrote:
>>
>>> 1) Why are you using multinom when this is not a multinomial
>>> logistic regression? You could just use a binomial glm.
>>>
>>> 2) The second argument to predict() is `newdata'. `sample' is an R
>>> function, so what did you mean to have there? I think the
>>> predictions should be a named vector if `sample' is a data frame.
>>>
>>> 3) There are many more examples of such things (and more
>>> explanation) in Venables & Ripley's MASS (the book).
>>>
>>> On Wed, 4 Jun 2003, Paul Bivand wrote:
>>>
>>>
>>>
>>>> I am doing one part of an evaluation of a mandatory welfare-to-work
>>>> programme in the UK.
>>>> As with all evaluations, the problem is to determine what would
>>>> have happened if the initiative had not taken place.
>>>> In our case, we have a number of pilot areas and no possibility of
>>>> random assignment.
>>>> Therefore we have been given control areas.
>>>> My problem is to select for survey individuals in the control areas
>>>> who match as closely as possible the randomly selected sample of
>>>> action area participants.
>>>> As I understand the methodology, the procedure is to run a logistic
>>>> regression to determine the odds of a case being in the sample,
>>>> across both action and control areas, and then choose for control
>>>> sample the control area individual whose odds of being in the
>>>> sample are closest to an actual sample member.
>>>>
>>>> So far, I have following the multinomial logistic regression
>>>> example in Fox's Companion to Applied Regression.
>>>> Firstly, I would like to know if the predict() is producing odds
>>>> ratios (or probabilities) for being in the sample, which is what I
>>>> am aiming for.
>>>
>>>
>>> You asked for `probs', so you got probabilities.
>>>
>>>
>>>
>>>> Secondly, how do I get rownames (my unique identifier) into the
>>>> output from predict() - my input may be faulty somehow and the
>>>> wrong rownames being picked up - as I need to export back to
>>>> database to sort and match in names, addresses and phone numbers
>>>> for my selected samples.
>>>>
>>>> My code is as follows:
>>>> londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig",
>>>> rownames=ORCID)
>>>> attach(londonpsm)
>>>> mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE
>>>> + NDYPTOT + NDLTUTOT + LOPTYPE)
>>>> lonoutput <- predict(mod.multinom, sample, type='probs')
>>>> london2 <- data.frame(lonoutput)
>>>>
>>>> The Logistic regression seems to work, although summary() says the
>>>> it is not a matrix.
>>>>
>>>
>>> what is `it'?
>>>
>>>
>>>
>>>> The output looks like odds ratios, but I would like to know whether
>>>> this is so.
>>>>
>>>
>>> No.
>>>
>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
> -----------------------------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> email: jfox at mcmaster.ca
> phone: 905-525-9140x23604
> web: www.socsci.mcmaster.ca/jfox
> -----------------------------------------------------
>
>
More information about the R-help
mailing list