[R] How to effectively remove Outliers from a binary logistic regression in R

Jim Lemon jim at bitwrit.com.au
Wed Sep 5 12:15:09 CEST 2012


On 09/05/2012 05:40 PM, Marcus Tullius wrote:
> Hallo there,
>
>   greetings from Germany.
>
>   I have a simple question for you.
>
>   I have run a binary logistic model, but there are lots of outliers distorting the real results.
>
>   I have tried to get rid of the outliers using the following commands:
>
>   remove = -c(56, 303, 365, 391, 512, 746, 859, 940, 1037, 1042, 1138, 1355)
>   MIGRATION.rebuild<- glm(MIGRATION, subset=remove)
>   influence(MIGRATION.rebuild)
>   influence.measures(MIGRATION.rebuild)
>
>   BUT it did not work.
>
>
>   My question is:
>
>   *Do you know a simple R-command which erases outliers and rebuilds the model without them?*
>
>   I am including my model below so that you may have an idea of how I am trying to do it.
>
Hi Francisco,
Your model didn't make it to the help list, but I think that the problem 
is in your attempt to use the "subset" argument in glm. The vector is 
supposed to include the indices of the values that you _want_ in the 
analysis, and it looks like you are trying to remove the values that you 
_don't_ want. Say you have 2000 rows in your data frame in the model. 
The "subset" argument should look something like this:

glm(MIGRATION,
  subset=!(1:2000 %in% c(56,303,365,391,512,746,859,940,1037,1042,1138, 
1355))

Jim




More information about the R-help mailing list