[R] How to effectively remove Outliers from a binary logistic regression in R
Jim Lemon
jim at bitwrit.com.au
Wed Sep 5 12:15:09 CEST 2012
On 09/05/2012 05:40 PM, Marcus Tullius wrote:
> Hallo there,
>
> greetings from Germany.
>
> I have a simple question for you.
>
> I have run a binary logistic model, but there are lots of outliers distorting the real results.
>
> I have tried to get rid of the outliers using the following commands:
>
> remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355)
> MIGRATION.rebuild<- glm(MIGRATION, subset=remove)
> influence(MIGRATION.rebuild)
> influence.measures(MIGRATION.rebuild)
>
> BUT it did not work.
>
>
> My question is:
>
> *Do you know a simple R-command which erases outliers and rebuilds the model without them?*
>
> I am including my model below so that you may have an idea of how I am trying to do it.
>
Hi Francisco,
Your model didn't make it to the help list, but I think that the problem
is in your attempt to use the "subset" argument in glm. The vector is
supposed to include the indices of the values that you _want_ in the
analysis, and it looks like you are trying to remove the values that you
_don't_ want. Say you have 2000 rows in your data frame in the model.
The "subset" argument should look something like this:
glm(MIGRATION,
subset=!(1:2000 %in% c(56,303,365,391,512,746,859,940,1037,1042,1138,
1355))
Jim
More information about the R-help
mailing list