[R] Changing entries of column of type "factor"/Adding a new level to a factor

Bert Gunter gunter.berton at gene.com
Mon Aug 27 23:54:58 CEST 2012


Perfectly sensible, and indeed what I originally wrote. But it only
works for my trivial example, not the general situation where it might
not be the first level that needs changing.

-- Bert

On Mon, Aug 27, 2012 at 1:52 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Aug 27, 2012, at 12:18 PM, Bert Gunter wrote:
>
>> Well ...See below.
>>
>> -- Cheers, Bert
>>
>> On Mon, Aug 27, 2012 at 9:19 AM, David Winsemius <dwinsemius at comcast.net>
>> wrote:
>>>
>>>
>>> On Aug 27, 2012, at 3:09 AM, Fridolin wrote:
>>>
>>>> What is a smart way to change an entry inside a column of a dataframe or
>>>> matrix which is of type "factor"?
>>>>
>>>> Here is my script incl. input data:
>>>>>
>>>>>
>>>>> #set working directory:
>>>>> setwd("K:/R")
>>>>>
>>>>> #read in data:
>>>>> input<-read.table("Exampleinput.txt", sep="\t", header=TRUE)
>>>>>
>>>>> #check data:
>>>>> input
>>>>
>>>>
>>>>  Ind      M1      M2      M3
>>>> 1    1   96/98 120/120     0/0
>>>> 2    2 102/108 120/124 305/305
>>>> 3    3  96/108 120/120     0/0
>>>> 4    4     0/0 116/120 300/305
>>>> 5    5  96/108 120/130 300/305
>>>> 6    6   98/98 116/120 300/305
>>>> 7    7  98/108 120/120 305/305
>>>> 8    8  98/108 120/120 305/305
>>>> 9    9  98/102 120/124 300/300
>>>> 10  10 108/108 120/120 305/305
>>>>>
>>>>>
>>>>> str(input)
>>>>
>>>>
>>>> 'data.frame':   10 obs. of  4 variables:
>>>> $ Ind: int  1 2 3 4 5 6 7 8 9 10
>>>> $ M1 : Factor w/ 8 levels "0/0","102/108",..: 5 2 4 1 4 8 7 7 6 3
>>>> $ M2 : Factor w/ 4 levels "116/120","120/120",..: 2 3 2 1 4 1 2 2 3 2
>>>> $ M3 : Factor w/ 4 levels "0/0","300/300",..: 1 4 1 3 3 3 4 4 2 4
>>>>>
>>>>>
>>>>>
>>>>> #replace 0/0 by 999/999:
>>>>> for (r in 1:10)
>>>>
>>>>
>>>> +   for (c in 2:4)
>>>> +     if (input[r,c]=="0/0") input[r,c]<-"999/999"
>>>> Warnmeldungen:
>>>> 1: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>>> invalid factor level, NAs generated
>>>> 2: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>>> invalid factor level, NAs generated
>>>> 3: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>>> invalid factor level, NAs generated
>>>>>
>>>>>
>>>>> input
>>>>
>>>>
>>>>  Ind      M1      M2      M3
>>>> 1    1   96/98 120/120    <NA>
>>>> 2    2 102/108 120/124 305/305
>>>> 3    3  96/108 120/120    <NA>
>>>> 4    4    <NA> 116/120 300/305
>>>> 5    5  96/108 120/130 300/305
>>>> 6    6   98/98 116/120 300/305
>>>> 7    7  98/108 120/120 305/305
>>>> 8    8  98/108 120/120 305/305
>>>> 9    9  98/102 120/124 300/300
>>>> 10  10 108/108 120/120 305/305
>>>>
>>>>
>>>> I want to replace all "0/0" by "999/999". My code should work for
>>>> columns
>>>> of
>>>> type "character" and "integer". But to make it work for a
>>>> "factor"-column
>>>> I
>>>> would need to add the new level of "999/999" at first, I guess. How do I
>>>> add
>>>> a new level?
>>>
>>>
>>>
>>> ?levels
>>>
>>> levels(input$M1) <- c(levels(input$M1), "999/999")
>>
>>
>> This adds an additional level; then you have to replace the "0/0"
>> level with this one; then you have to call levels again to remove the
>> "0/0" level.
>
>
> Then do it this way (different from what I thought was originally desired):
>
>> x <- factor(letters[1:3])
>> levels(x) <- c("d", levels(x)[2:3])
>> x
> [1] d b c
> Levels: d b c
>
>>
>> I think the following slight tweak may be preferred, as illustrated
>> with a little example (opinions?):
>>
>>> x <- factor(letters[1:3])
>>> x
>>
>> [1] a b c
>> Levels: a b c
>>
>> ## create a new levels vector
>>>
>>> newlvl <- levels(x)
>>> newlvl[newlvl == "a"] <- "d"
>>
>>
>> ## Create the new factor and replace the old with it
>>
>>> x <- factor(newlvl[x])
>>> x
>>
>> [1] d b c
>> Levels: b c d
>>
>> Note, however, as Bill D. said, in either case your level ordering --
>> which will be used, e.g. in printing and displaying -- will be weird.
>
>
> So the above method might be what you expect. Several options are now
> available to the questioner.
>
> --
> David.
>>
>>
>>
>>
>>>
>>> --
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
> David Winsemius, MD
> Alameda, CA, USA
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list