[R] Having trouble converting a dataframe of character vectors to factors
Lopez, Dan
lopez235 at llnl.gov
Thu Feb 21 16:55:28 CET 2013
Hi Bert,
Thanks for drawing my attention to "simplify" argument and for the examples. I understand know.
Thanks.
Dan
-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com]
Sent: Wednesday, February 20, 2013 4:25 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] Having trouble converting a dataframe of character vectors to factors
Pleaser re-read ?sapply and pay particular attention to the "simplify" argument.
The following should help explain the issues:
> z <- data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE)
> sapply(z,class)
a b
"character" "character"
> z1 <- sapply(z,as.factor)
> sapply(z1,class)
a b c d e f
"character" "character" "character" "character" "character" "character"
> z2 <- sapply(z,factor, simplify = FALSE)
> sapply(z2,class)
a b
"factor" "factor"
> z3 <- lapply(z,factor)
> sapply(z3,class)
a b
"factor" "factor"
> z3
$a
[1] a b c
Levels: a b c
$b
[1] d e f
Levels: d e f
## Note that both z2 and z3 are lists, and would have to be converted back to data frames.
-- Bert
On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
> R Experts,
>
> I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors.
>
> I tried the following which did not work:
> scs2<-sapply(scs2,as.factor)
> also this didn't work:
> scs2<-sapply(scs2,function(x) as.factor(x))
>
> After doing either of above I end up with
>>str(scs2)
>
> chr [1:10, 1:10] "very important" "very important" "very important" "very important" ...
>
> - attr(*, "dimnames")=List of 2
>
> ..$ : NULL
>
> ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...
>
>>class(scs2)
> "matrix"
>
> But when I do it one at a time it works:
> scs2$Q1_1<-as.factor(scs2$Q1_1)
> scs2$Q1_2<- as.factor(scs2$Q1_2)
>
> What am I doing wrong? How do I accomplish this with sapply or similar function?
>
> Data for reproducibility:
>
>
> scs2<-structure(list(Q1_1 = c("very important", "very important",
> "very important",
>
> "very important", "very important", "very important", "very
> important",
>
> "somewhat important", "important", "very important"), Q1_2 =
> c("important",
>
> "somewhat important", "very important", "important", "important",
>
> "very important", "somewhat important", "somewhat important",
>
> "very important", "very important"), Q1_3 = c("very important",
>
> "important", "very important", "very important", "important",
>
> "very important", "very important", "somewhat important", "not
> important",
>
> "important"), Q1_4 = c("very important", "important", "very
> important",
>
> "very important", "important", "important", "important", "very
> important",
>
> "somewhat important", "important"), Q1_5 = c("very important",
>
> "not important", "important", "very important", "not important",
>
> "important", "somewhat important", "important", "somewhat important",
>
> "not important"), Q1_6 = c("very important", "not important",
>
> "important", "very important", "somewhat important", "very important",
>
> "very important", "very important", "important", "important"),
>
> Q1_7 = c("very important", "somewhat important", "important",
>
> "somewhat important", "important", "important", "very important",
>
> "very important", "somewhat important", "not important"),
>
> Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much",
>
> "Very Much", "Very Much", "Very Much", "Very Much", "Very Much",
>
> "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes",
>
> "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None",
>
> "None", "Confirmed Field of Study", "Confirmed Field of Study",
>
> "Confirmed Field of Study", "None", "None", "None")), .Names =
> c("Q1_1",
>
> "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4"
>
> ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,
>
> 172L, 110L), class = "data.frame")
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list