[R] Log transformation and -Inf values for use in glm()

Stephen Weigand weigand.stephen at gmail.com
Sun Feb 8 01:59:42 CET 2009


Paul,

On Fri, Feb 6, 2009 at 3:25 PM, Paul Warren Simonin
<Paul.Simonin at uvm.edu> wrote:

> Hello,
>  I am writing regarding log transformation of data in a single matrix
> column, and subsequent use of these data in a glm model fit. I have a data
> matrix in which I am using the log function to transform the values. This
> transformation results in -Inf values in some places, though. I then receive
> an error when this matrix is used in the glm function, and would like to
> know this can be avoided.
>  I have attempted several methods already including the use of na.exclue
> commands in the glm statement:
>
>> DistributionT<-glm(EarlyLn$yoyras~EarlyLn$temp,family=gaussian(link =
>> "identity"),na.exclude)
>
> I have also attempted to use the is.finite command:
>
> EarlyLn$yoyras<-EarlyLn[is.finite(EarlyLn$yoyras)==T,]
>
> I know another option would be to use a type of find and replace command to
> remove entire rows of the matrix that contain 0's (before log
> transformation) or -Inf (after transformation), but I do not know how this
> is done.
>
> Thank you for any advice or tips regarding conducting this transformation
> and feeding the data matrix into glm.
>
> Sincerely,
> Paul S.

In general, use syntax like this:

glm(yoyras ~ log(temp), data = EarlyLn, subset = temp > 0)

However, it's bad statistical practice to use a transformation that
causes you to lose data. One approach is to add a constant to temp
via:

glm(yoyras ~ log(temp + 1), data = EarlyLn, subset = temp > 0)

with the disadvantage being that the constant you choose is arbitrary
but affects your inferences.

Stephen
Rochester, MN USA




More information about the R-help mailing list