[R] Outliers and overdispersion

Bert Gunter gunter.berton at gene.com
Tue Aug 13 18:07:41 CEST 2013


The central question is: What caused the 3 unusual values? What is
their scientific relevance? Only you can answer that, not us.

-- Bert

On Tue, Aug 13, 2013 at 8:51 AM, Marta Lomas <lomasvega at hotmail.com> wrote:
> Thanks for your interest and prompt answer!
>
> What I try to estimate is the correlation of one bird species counts with a set of environmental parameters. The count data are zero-inflated and overdispersed. I am modeling with hurdle-negative binomial-mixed effects.
> The results are very difficult to interpret and it get easier dropping out 3 outliers. But I do not know if I should do this..
> Thanks!
> Marta
>
>
>> Subject: Re: [R] Outliers and overdispersion
>> From: szehnder at uni-bonn.de
>> Date: Tue, 13 Aug 2013 17:41:10 +0200
>> CC: r-help at r-project.org
>> To: lomasvega at hotmail.com
>>
>> I do not know what you are exactly estimating, but if it is about count models and the model fit gets better when you drop the outliers, it does not say, that the model is now more correct. It just says, if the data were without the outliers, this model would fit good.
>>
>> Overdispersion in count data is sometimes a cue, that you have a mixture distribution as the generating process - for example instead of one, K different (sub)species of birds which were aggregated in the count data. In this case a mixture (negative binomial)- distribution with K components could fit the data better.
>>
>>
>> Best
>>
>> Simon
>>
>> On Aug 13, 2013, at 5:28 PM, Marta Lomas <lomasvega at hotmail.com> wrote:
>>
>> >
>> >
>> >
>> > Hi  again,
>> >
>> > I have a question on some outliers that I have in my response variable (wich are bird counts). At the beginning I did not drop them
>> > out because they are part of the normal counts and I considered them "ecologically" correct.
>> >
>> > However, I
>> > tried some of the same models without ouliers and the AICs are thus better. I
>> > also have nice significances this way...
>> >
>> >
>> > So would you say that, even though the outliers are right
>> > observations and taking into consideration that already the negative binomial
>> > distribution that I am using is accounting for the some of the overdispersion due to the outliers, it is
>> > better to drop them out as the models fit better this way?
>> >
>> >
>> > Thanks for your patience!
>> >
>> > :)
>> >
>> >
>> >
>> >
>> >
>> >
>> >     [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list