[R] nmax parameter in factor function

Bert Gunter bgunter.4567 at gmail.com
Sun Jun 4 06:35:15 CEST 2017


I'll go just a bit "fer-er." It appears the anomaly -- I hesitate to
call it a bug -- is in the C code for duplicated.default():

> duplicated(letters[1:10],nmax=10)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> duplicated(letters[1:10],nmax=9)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> duplicated(letters[1:10],nmax=8) ## for all nmax <9
Error in duplicated.default(letters[1:10], nmax = 8) : hash table is full

Cleverer folks than I must now explain (and possibly correct me).

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jun 3, 2017 at 9:11 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Well, you won't like this, but it is kind of wimpily (is that a word?)
> documented:
>
> If you check the code of factor(), you will see that nmax appears as
> an argument in a call to unique(). ?unique says for nmax, "... see
> duplicated" . And ?duplicated says:
>
> "If nmax is set too small there is liable to be an error: nmax = 1 is
> silently ignored."
>
> So sometimes you get an error when nmax is too small with the hash
> table error message; and sometimes you just apparently get the nmax
> argument ignored:
>
>> identical(factor(letters,nmax = 25), factor(letters,nmax=26))
> [1] TRUE
>
> and that, to paraphrase what Roger Hammerstein said about Kansas City,
> is about "as fer as I can go."
>
> (http://lyricsplayground.com/alpha/songs/e/everythingsuptodateinkansascity.shtml)
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Jun 3, 2017 at 6:14 PM, Ramnik Bansal <ramnik.bansal at gmail.com> wrote:
>> I have been trying to understand how the argument 'nmax' works in
>> 'factor' function. R-Documentation states - "Since factors typically
>> have quite a small number of levels, for large vectors x it is helpful
>> to supply nmax as an upper bound on the number of unique values."
>>
>> In the code below what is the reason for error when value of nmax is
>> 24. Why did the same error not occur with nmax = 25  and also how come
>> there are 26 levels when nmax = 25 ?
>>
>>> factor(x = letters, nmax = 26)
>>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
>> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>
>>> factor(x = letters, nmax = 25)
>>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
>> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>
>>> factor(x = letters, nmax = 24)
>> Error in unique.default(x, nmax = nmax) : hash table is full
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list