[R] convert columns of dataframe to same factor levels
Duncan Murdoch
murdoch@dunc@n @ending from gm@il@com
Wed Dec 19 14:01:47 CET 2018
On 19/12/2018 6:48 AM, Luigi Marongiu wrote:
> Thank you,
> that worked fine for me.
> Best wishes of merry Christmas and happy new year,
> Luigi
>
Actually it's wrong! Sorry about that.
If you look at my.data.new$column_2, you'll see that the levels have
changed:
> my.data
column_1 column_2 column_3
1 A B A
2 B B A
3 C C B
4 D E B
5 E E A
> my.data.new
column_1 column_2 column_3
1 A A A
2 B A A
3 C B B
4 D C B
5 E C A
What you want is this instead:
my.data.new <- as.data.frame(lapply(my.data, function(x) {factor(x,
levels = thelevels)}))
The last example in the ?levels help page does this too. I wonder if
that is intentional?
levels> ## we can add levels this way:
levels> f <- factor(c("a","b"))
levels> levels(f) <- c("c", "a", "b")
levels> f
[1] c a
Levels: c a b
levels> f <- factor(c("a","b"))
levels> levels(f) <- list(C = "C", A = "a", B = "b")
levels> f
[1] A B
Levels: C A B
Duncan Murdoch
> On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
> <murdoch.duncan using gmail.com> wrote:
>>
>> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a data frame with character values where each character is a
>>> level; however, not all columns of the data frame have the same
>>> characters thus, when generating the data frame with stringsAsFactors
>>> = TRUE, the levels are different for each column.
>>> Is there a way to provide a single vector of levels and assign the
>>> characters so that they match such vector?
>>> Is there a way to do that not only when setting the data frame but
>>> also when reading data from a file with read.table()?
>>>
>>> For instance, I have:
>>> column_1 = c("A", "B", "C", "D", "E")
>>> column_2 = c("B", "B", "C", "E", "E")
>>> column_3 = c("C", "C", "D", "D", "C")
>>> my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>>>> str(my.data)
>>> 'data.frame': 5 obs. of 3 variables:
>>> $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>>> $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>>> $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>>>
>>> Thank you
>>>
>>
>> I don't think read.table() can do it for you automatically. To do it
>> yourself, you need to get a vector of the levels. If you know this,
>> just assign it to a variable; if you don't know it, compute it as
>>
>> thelevels <- unique(unlist(lapply(my.data, levels)))
>>
>> Then set the levels of each column to thelevels:
>>
>> my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
>> <- thelevels; x}))
>>
>> Duncan Murdoch
>
>
>
More information about the R-help
mailing list