[R] Building factors across two columns, is this possible?
Rui Barradas
ruipbarradas at sapo.pt
Sat Nov 24 17:15:57 CET 2012
Hello,
If you want the factor sorted, you'll have to do it manually.
levs <- sort(unique(as.character(unlist(dat))))
Rui Barradas
Em 24-11-2012 12:57, Rui Barradas escreveu:
> Hello,
>
> You can do what you want, but the coding of factors starts at 1 not at 0.
>
>
> dat <- read.table(text="
> V1 V2 V3
> 1 sun moon stars
> 2 stars moon sun
> 3 cat dog catdog
> 4 dog moon sun
> 5 bird plane superman
> 6 1000 dog 2000
> ", header = TRUE)
>
> levs <- unique(unlist(dat))
>
> dat$V1 <- factor(dat$V1, levels = levs)
> dat$V2 <- factor(dat$V2, levels = levs)
> dat$V3 <- factor(dat$V3, levels = levs)
>
> str(dat)
> 'data.frame': 6 obs. of 3 variables:
> $ V1: Factor w/ 11 levels "sun","stars",..: 1 2 3 4 5 6
> $ V2: Factor w/ 11 levels "sun","stars",..: 7 7 4 7 8 4
> $ V3: Factor w/ 11 levels "sun","stars",..: 2 1 9 1 10 11
>
>
> Hope this helps,
>
> Rui Barradas
> Em 24-11-2012 07:33, Brian Feeny escreveu:
>> To clarify on my previous post, here is a representation of what I am
>> trying to accomplish:
>>
>> I would like every unique value in either column to be assigned a
>> number so like so:
>>
>> V1 V2 V3
>> 1 sun moon stars
>> 2 stars moon sun
>> 3 cat dog catdog
>> 4 dog moon sun
>> 5 bird plane superman
>> 6 1000 dog 2000
>>
>> Level Value
>> sun -> 0
>> stars -> 1
>> cat -> 2
>> dog -> 3
>> bird -> 4
>> 1000 -> 5
>> moon -> 6
>> plane -> 7
>> catdog -> 8
>> superman -> 9
>> 2000 -> 10
>> etc
>> etc
>>
>> so internally its represented as:
>>
>> V1 V2 V3
>> 1 0 6 1
>> 2 1 6 0
>> 3 2 3 8
>> 4 3 6 0
>> 5 4 7 9
>> 6 5 3 10
>>
>> does this make sense? I am hoping there is a way to accomplish this.
>>
>> Brian
>>
>> On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:
>>
>>> I am trying to make it so two columns with similar data use the same
>>> internal numbers for same factors, here is the example:
>>>
>>>> read.csv("test.csv",header =FALSE,sep=",")
>>> V1 V2 V3
>>> 1 sun moon stars
>>> 2 stars moon sun
>>> 3 cat dog catdog
>>> 4 dog moon sun
>>> 5 bird plane superman
>>> 6 1000 dog 2000
>>>> data <- read.csv("test.csv",header =FALSE,sep=",")
>>>> str(data)
>>> 'data.frame': 6 obs. of 3 variables:
>>> $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
>>> $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
>>> $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
>>>
>>>> as.numeric(data$V1)
>>> [1] 6 5 3 4 2 1
>>>> as.numeric(data$V2)
>>> [1] 2 2 1 2 3 1
>>>> as.factor(data$V1)
>>> [1] sun stars cat dog bird 1000
>>> Levels: 1000 bird cat dog stars sun
>>>> as.factor(data$V2)
>>> [1] moon moon dog moon plane dog
>>> Levels: dog moon plane
>>>
>>>
>>> So notice "dog" is 4 in V1, yet its 1 in V2. Is there a way, either
>>> on import, or after, to have factors computed for both columns and
>>> assigned
>>> the same internal values?
>>>
>>> Brian
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list