[R] many chr2factors ?

christian schulz ozric at web.de
Wed Jun 1 20:59:26 CEST 2005


...many thanks to clarify for me some things!
christian

>Dear Christian
>
>If you create your data frame by using data.frame all characters
>are automatically transformed into factors unless you force them
>to stay a character. Maybe that can solve your problem easily.
>
>dat <- data.frame(a=1:10, b=letters[1:10])
>str(dat)
>  `data.frame':	10 obs. of  2 variables:
>  $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
>  $ b: int  1 2 3 4 5 6 7 8 9 10
> 
>Assuming that doesn't solve your problem due to the way your
>data frame are created you can do it afterwards.
>
>There are two problems with your code. 
>
>First: (and that causes the error) you use in your repeat 
>
>if(!is.character(df[,i]))
>  next
>
>Imagine that the last column of you data frame is not a
>character you jump to the next cycle and then you are outside of
>the range of your data frame. Your break condition is ignored.
>
>Second: You change your data frame inside of a
>function. Variables that are created or changed within a
>function are local. Their life ends with the end of the
>function. Therefore all changes you do will have no effect on
>the global data frame you want to change. See the example:
>
>dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"),
>                  row.names = as.character(1:10), class = "data.frame")
>str(data.frame(dat1))
>  `data.frame':	10 obs. of  2 variables:
>  $ a: int  1 2 3 4 5 6 7 8 9 10
>  $ b: chr  "a" "b" "c" "d" ...
>tofac(dat1)
>  [1] 2
>str(data.frame(dat1))
>  `data.frame':	10 obs. of  2 variables:
>  $ a: int  1 2 3 4 5 6 7 8 9 10
>  $ b: chr  "a" "b" "c" "d" ...
>
>You can use the following code instead
>
>tofac <- function(x){
>  for(i in 1:length(x)) {
>    if(is.character(x[,i]))
>      x[,i] <- factor(x[,i])
>  }
>  x
>}
>
>dat1 <- tofac(dat1)
>  [1] 2
>str(dat1)
>  `data.frame':	10 obs. of  2 variables:
>  $ a: int  1 2 3 4 5 6 7 8 9 10
>  $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
>
>The for loop avoids the problem with the index. Therefore it
>works in example that have a non character variable in the last
>column, too and by returning x at the end you are sure that you
>object keeps existing.
>
>Regards,
>
>Christoph
>
>--------------------------------------------------------------
>Christoph Buser <buser at stat.math.ethz.ch>
>Seminar fuer Statistik, LEO C13
>ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
>phone: x-41-44-632-4673		fax: 632-1228
>http://stat.ethz.ch/~buser/
>--------------------------------------------------------------
>
>christian schulz writes:
> > Hi,
> > 
> > i would like transfrom 
> > characters from a data.frame to factors automatic.
> > 
> >  > tofac <- function(df){
> > + i=0
> > + repeat{
> > + i <- i+1
> > + if(!is.character(df[,i]))
> > + next
> > + df[,i] <- as.factor(df[,i])
> > + print(i)
> > + if(i == length(df))
> > + break }
> > + }
> >  >
> >  > tofac(abrdat)
> > [1] 7
> > [1] 8
> > [1] 9
> > [1] 11
> > [1] 13
> > [1] 15
> > Error in "[.data.frame"(df, , i) : undefined columns selected
> > 
> > This are the correct columns and i get the idea put into the loop
> > a empty matrix with dimension like df and return it!?
> > 
> > Another check?
> > abrdat2 <- apply(abrdat,2,function(x) 
> > ifelse(is.character(x),as.factor(x),x))
> > 
> > 
> > many thanks & regards,
> > christian
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>  
>




More information about the R-help mailing list