[R] Building factors across two columns, is this possible?

Rui Barradas ruipbarradas at sapo.pt
Sat Nov 24 17:15:57 CET 2012


Hello,

If you want the factor sorted, you'll have to do it manually.

levs <- sort(unique(as.character(unlist(dat))))

Rui Barradas
Em 24-11-2012 12:57, Rui Barradas escreveu:
> Hello,
>
> You can do what you want, but the coding of factors starts at 1 not at 0.
>
>
> dat <- read.table(text="
> V1    V2       V3
> 1   sun  moon    stars
> 2 stars  moon      sun
> 3   cat   dog   catdog
> 4   dog  moon      sun
> 5  bird plane superman
> 6  1000   dog     2000
> ", header = TRUE)
>
> levs <- unique(unlist(dat))
>
> dat$V1 <- factor(dat$V1, levels = levs)
> dat$V2 <- factor(dat$V2, levels = levs)
> dat$V3 <- factor(dat$V3, levels = levs)
>
> str(dat)
> 'data.frame':   6 obs. of  3 variables:
>  $ V1: Factor w/ 11 levels "sun","stars",..: 1 2 3 4 5 6
>  $ V2: Factor w/ 11 levels "sun","stars",..: 7 7 4 7 8 4
>  $ V3: Factor w/ 11 levels "sun","stars",..: 2 1 9 1 10 11
>
>
> Hope this helps,
>
> Rui Barradas
> Em 24-11-2012 07:33, Brian Feeny escreveu:
>> To clarify on my previous post, here is a representation of what I am 
>> trying to accomplish:
>>
>> I would like every unique value in either column to be assigned a 
>> number so like so:
>>
>>      V1    V2       V3
>> 1   sun  moon    stars
>> 2 stars  moon      sun
>> 3   cat   dog   catdog
>> 4   dog  moon      sun
>> 5  bird plane superman
>> 6  1000   dog     2000
>>
>> Level            Value
>> sun            ->    0
>> stars        ->    1
>> cat            ->    2
>> dog            ->    3
>> bird            ->    4
>> 1000        ->    5
>> moon        ->    6
>> plane        ->    7
>> catdog        ->    8
>> superman    ->    9
>> 2000        ->   10
>> etc
>> etc
>>
>> so internally its represented as:
>>
>>      V1    V2       V3
>> 1   0        6    1
>> 2   1        6    0
>> 3   2        3    8
>> 4   3        6    0
>> 5   4        7    9
>> 6   5        3    10
>>
>> does this make sense?  I am hoping there is a way to accomplish this.
>>
>> Brian
>>
>> On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:
>>
>>> I am trying to make it so two columns with similar data use the same 
>>> internal numbers for same factors, here is the example:
>>>
>>>> read.csv("test.csv",header =FALSE,sep=",")
>>>      V1    V2       V3
>>> 1   sun  moon    stars
>>> 2 stars  moon      sun
>>> 3   cat   dog   catdog
>>> 4   dog  moon      sun
>>> 5  bird plane superman
>>> 6  1000   dog     2000
>>>> data <- read.csv("test.csv",header =FALSE,sep=",")
>>>> str(data)
>>> 'data.frame':    6 obs. of  3 variables:
>>> $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
>>> $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
>>> $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
>>>
>>>> as.numeric(data$V1)
>>> [1] 6 5 3 4 2 1
>>>> as.numeric(data$V2)
>>> [1] 2 2 1 2 3 1
>>>> as.factor(data$V1)
>>> [1] sun   stars cat   dog   bird  1000
>>> Levels: 1000 bird cat dog stars sun
>>>> as.factor(data$V2)
>>> [1] moon  moon  dog   moon  plane dog
>>> Levels: dog moon plane
>>>
>>>
>>> So notice "dog" is 4 in V1, yet its 1 in V2.  Is there a way, either 
>>> on import, or after, to have factors computed for both columns and 
>>> assigned
>>> the same internal values?
>>>
>>> Brian
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list