[R] How to ignore data

Bert Gunter gunter.berton at gene.com
Mon Dec 13 19:17:40 CET 2010


Inline below. -- Bert

On Mon, Dec 13, 2010 at 9:42 AM, Steve Sidney <sbsidney at mweb.co.za> wrote:
> Thanks for the questions.
>
> 1) The data represents micro-organism counts and a count of zero in  this
> case is highly unlikely given the info we have; including the other
> participants.

?? Censoring or an experimental failure? Big difference.

> 2) The data is submitted in duplicate and then a standardised sum and
> difference is established and is used to calculate a Z-score which is used
> as a measure of performance.

Z scores are usually inappropriate for count data, which are discrete
and tend to be skew.
>
> Given both 1) and 2) it is necessary to exclude a raw count of zero (since
> the log of 0 is meaningless) and a count of one (since the log of 1 of
> course is zero).

False. Correct statement is: "Because I do not know the statistical
methodology necessary to handle such discrete data with 0 counts, I
exclude them."  You are confusing your ignorance of statistical
methodology with the need for spurious ad hoc treatments. 0 counts can
and should be handled by appropriate statistical methods (e.g.
possibly 0 inflated Poisson models via glm() or otherwise).

>
> I guess one can think of these values as outliers and that is what I am
> trying to exclude.

This is a wholly unscientific statement, I'm afraid.

>
> There is ample evidence that such an approach is acceptable.

What evidence, pray tell? -- a prior culture of inappropriate
analyses, perhaps? I do not wish to engage in a debate about this,
but, again, all I can say is that the above statement is not
scientific. If I were consulting with you, I would say "Please show me
your 'evidence.' " But, of course, I am not, and won't.

None of this is to say that you aren't correct in all respects. It is
just that you have raised all my usual warning flags, so that I am
somewhat skeptical. But that's MY problem. This is the last I will say
on the matter, so feel free to get in the final word, as I will not
respond.

And I wish you success in your efforts.

-- Bert
>
> Thanks for the interest
> Steve
>
> On 2010/12/13 06:47 PM, Stavros Macrakis wrote:
>>
>> If you need to take the log of the values for your calculation, then
>> what does it mean that you have 0 values in the input?
>>
>> And why do you need to exclude the 1 values?
>>
>> Are you sure that a) you are doing the correct kind of analysis and b)
>> the analysis is correct if you exclude 0 and 1?
>>
>>             -s
>>
>> On Mon, Dec 13, 2010 at 10:38, Steve Sidney<sbsidney at mweb.co.za>  wrote:
>>>
>>> Dear list
>>>
>>> I have quite a small data set in which I need to have the following
>>> values
>>> ignored - not used when performing an analysis but they need to be
>>> included
>>> later in the report that I write.
>>>
>>> Can anyone help with a suggestion as to how this can be accomplished
>>>
>>> Values to be ignored
>>>
>>> 0 - zero and 1 this is in addition to NA (null)
>>>
>>> The reason is that I need to use the log10 of the values when performing
>>> the
>>> calculation.
>>>
>>> Currently I hand massage the data set, about a 100 values, of which less
>>> than 5 to 10 are in this category.
>>>
>>> The NA values are NOT the problem
>>>
>>> What I was hoping was that I did not have to use a series of if and
>>> ifelse
>>> statements. Perhaps there is a more elegant solution.
>>>
>>> Any ideas would be welcomed.
>>>
>>> Regards
>>> Steve
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics



More information about the R-help mailing list