[R] {car} outlierTest looses p/q values
Phil
chobophil at gmail.com
Tue Oct 21 09:34:07 CEST 2014
Hi all,
as John pointed out, there is a way to create settings where the
studentized residuals are undefined. However, after cross-checking it
seems that the residuals are getting calculated without any error. The
problem comes up when I use outlierTest to assign a p,q value,respectively.
Below is the code and some printouts:
non.zero.orga.vector <-
c(1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,13.98) #targervector
print(meta.Matrix[,3])
x1 x28 x2 x3 x4 x51 x52 x5
x6 x7 x82
4.6 35.8 4.1 5.4 5.2 27.9 48.2 4.5
5.7 4.2 100.5
print(meta.Matrix[,10])
x1 x28 x2 x3
0.2990466 1.0497156 0.2805028 0.3517993
x4 x51 x52 x5
0.3543678 1.0178604 0.4933182 0.2810865
x6 x7 x82
0.4349550 0.3269192 3.0889747
glModel <- glm(non.zero.orga.vector ~ meta.Matrix[,3]+meta.Matrix[,10])
#create glm for testing
print(glModel$residuals) #check if residuals are calculated for every
entry in the target vector
x1 x28 x2 x3
x4 x51 x52
0.6600696 -2.5816334 0.7438969 0.4129873 0.3976498 -2.5350861
0.3128240
x5 x6 x7 x82
0.7465766 0.0102020 0.5181337 1.3143796
test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf) #test
the glm for outlier, cutoff=Inf/n.max=Inf == report everything
print(test.Res$bonf.p) #check if q-value exists
x28 x51 x52 x5 x2
x1 x7 x3
0.1915995 0.2240759 2.1637438 6.0211575 6.0306110 6.4618792 7.1932613
7.7595619
x4 x6
7.8394477 9.9437848
As you see the glm calculates residuals for x82 (which is in fact 1.314)
but the outlierTest does not assign a p/q value to it. Does anyone know why?
Thanks in advance,
Phil
On 10/17/2014 08:54 PM, John Fox wrote:
> Dear Phil,
>
> Yes, that's a bit clearer. One can invent data configurations where certain studentized residuals are undefined. For example, try the following:
>
> y <- c(0, 0, 0, 0, 0, 1)
> x <- 1:6
> xx <- (1:6 - 3.5)^2
> rstudent(lm(y ~ x))
> rstudent(lm(y ~ xx))
> plot(x, y)
> plot(xx, y)
>
> The plots should clarify what's going on.
>
> I'm copying to r-help since the discussion began there.
>
> I hope this helps,
> John
>
> On Fri, 17 Oct 2014 18:35:59 +0200
> Philip Stevens <chobophil at gmail.com> wrote:
>> Dear John,
>>
>> Thank you for your fast reply. Unfortunately I am out of office right now
>> but I will get back to you on Monday asap with a toy example and some code.
>> Meanwhile let me try to explain further.
>>
>> Basically not the glm but the outlierTest afterwards fails. And it only
>> fails if all values, used to set up the glm, are exactly 1 except for one
>> value which can be arbitrary large. I construct the glm from a vector of
>> doubles (the target values) and a vector of other numeric values(metadata).
>> (So this should be fit <- glm(targetVector ~ metadata) ) And want to check
>> if one of those doubles(in the target vector) is an outlier using
>> outlierTest(fit). The residuals are calculated by the glm but the
>> outlierTest does not report a p nor a q value for the one value in the
>> target vector which is not 1. And I can't figure out why...
>>
>> I hope this makes it a bit clearer.
>> Anyways I will come back to you on Monday.
>>
>> Best,
>>
>> Phil
>> Am 17.10.2014 17:43 schrieb "John Fox" <jfox at mcmaster.ca>:
>>
>>> Dear Phil,
>>>
>>> After reading your posting several times, I still don't understand what
>>> you did. As usual, having a reproducible example illustrating the error
>>> would be a great help. I do have a guess about the source of the error:
>>> glm() failed in some way for the problematic case.
>>>
>>> Best,
>>> John
>>>
>>> ------------------------------------------------
>>> John Fox, Professor
>>> McMaster University
>>> Hamilton, Ontario, Canada
>>> http://socserv.mcmaster.ca/jfox/
>>>
>>>
>>> On Fri, 17 Oct 2014 12:53:30 +0200
>>> Phil <chobophil at gmail.com> wrote:
>>>> Hi guys,
>>>>
>>>> I came across a strange phenomena and can't figure out why it happens by
>>> myself so here we go.
>>>> I got a dataframe which consists of double numbers which I want to
>>> check, row-wise if there are outliers in the rows.
>>>> So I iterate over the rows and create a glm using the numbers of that
>>> particular row. Which might look like this:
>>>> case1)
>>>> x1 x2 x3 x4 x5 x6 x7
>>>> x8 x9 x10 x11
>>>> 0.00 3.91 0.00 0.00 0.00 68.03 40.39 0.00
>>>> 0.00 0.00 4.11
>>>>
>>>> or like this:
>>>> case2)
>>>> x1 x2 x3 x4 x5 x6 x7
>>>> x8 x9 x10 x11
>>>> 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
>>>> 1.00 1.00 5.34
>>>>
>>>> or any other combination of double numbers...
>>>>
>>>> however, using a glm like this:
>>>>
>>>> glModel <- glm(vector ~ some_other_meta_data_which_is_double_numbers)
>>>>
>>>> and testing it with:
>>>>
>>>> test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf)
>>>>
>>>> I always get a result consisting of the desired p and q values but not
>>> if the vector I use looks like case2. There is no error message and the
>>> computation does not stop either.
>>>> However, all p and q values are produced except for the last value x11.
>>>>
>>>> Any idea why this particular value gets dropped from the output of the
>>> outlierTest Method in the car package.
>>>> Here is the sessioninfo:
>>>>
>>>> sessionInfo()
>>>> R version 3.1.1 (2014-07-10)
>>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
>>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] ggplot2_1.0.0 car_2.0-21 RColorBrewer_1.0-5 iNEXT_1.0
>>>> [5] vegan_2.0-10 lattice_0.20-29 permute_0.8-3
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] colorspace_1.2-4 compiler_3.1.1 digest_0.6.4 grid_3.1.1
>>>> [5] gtable_0.1.2 labeling_0.3 MASS_7.3-33 munsell_0.4.2
>>>> [9] nnet_7.3-8 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2
>>>> [13] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.1
>>>>
>>>> Any help is highly appreciated.
>>>>
>>>> Thanks
>>>>
>>>> Phil
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
> ------------------------------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/
>
>
>
More information about the R-help
mailing list