[R] Question concerning side effects of treating invalid factor levels
tibor@kiss m@iii@g oii rub@de
tibor@kiss m@iii@g oii rub@de
Mon Sep 19 13:38:33 CEST 2022
Dear Eric,
thank you very much. I wouldn’t have come to the idea to look up the help page for _c()_, which of course explains the coercion to the highest type.
Best
T.
> Am 19.09.2022 um 13:31 schrieb Eric Berger <ericjberger using gmail.com>:
>
> You are misinterpreting what is going on.
> The rbind command includes c(char, char, int) which produces a
> character vector of length 3.
> This is what you are rbind-ing which changes the type of the RT column.
>
> If you do rbind(df, data.frame(P="in", ANSWER="V>N",
> RT=round(runif(1,7000,16000),0)))
> you will see that everything is fine. (New factor values are created.)
>
> HTH,
> Eric
>
> On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help
> <r-help using r-project.org> wrote:
>>
>> Dear List members,
>>
>> I have tried now for several times to find out about a side effect of treating invalid factor levels, but did not find an answer. Various answers on stackexchange etc. produce the stuff that irritates me without even mentioning it.
>> So I am asking the list (apologies if this has been treated in the past).
>>
>> If you add an invalid factor level to a column in a data frame, this has the side effect of turning a numerical column into a column with character strings. Here is a simple example:
>>
>>> df <- data.frame(
>> P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
>> ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
>> RT = round(runif(6, 7000, 16000), 0))
>>
>>> str(df)
>> 'data.frame': 6 obs. of 3 variables:
>> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
>> $ RT : num 11157 13719 14388 14527 14686 ..
>>
>>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
>>
>>> str(df)
>> 'data.frame': 7 obs. of 3 variables:
>> $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
>> $ RT : chr "11478" "15819" "8305" "8852" …
>>
>> You see that RT has changed from _num_ to _chr_ as a side effect of adding the invalid factor level as NA. I would appreciate understanding what the purpose of the type coercion is.
>>
>> Thanks in advance
>>
>>
>> Tibor
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list