[R] understanding how R determines numbers and characters when creating a data frame

Greg Snow Greg.Snow at imail.org
Wed Feb 18 22:27:15 CET 2009


The culprit is the cbind function.  When given 2 vectors (not already something else), cbind will create a matrix, not a data frame.  A matrix can only have 1 type, so the numbers get converted to character.  In your first example you never do create a data frame, you just build a matrix (try str(results)) so fix cannot change a single column to numeric in something that is a matrix.  In the second example you do create a data frame so fix will allow changing of columns, but the cbind inside the call to data.frame is still creating a matrix (and converting numeric to character) before it is included in the data frame.  Remove the cbind and just do:

out1 <- data.frame(species=as.character(paste(s)),obsnum=obsnum)

and then out1 will be a data frame without ever converting the number obsnum to a character.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Alan Smith
> Sent: Wednesday, February 18, 2009 2:01 PM
> To: r-help at r-project.org
> Subject: [R] understanding how R determines numbers and characters when
> creating a data frame
> 
> Hello R Users and Developers,
> 
> I have a basic question about how R works.  Over the past few years I
> have
> struggled when I try to generate a new data frame that I believe should
> contain numeric data in some columns and character data in others only
> to
> find everything converted to character data. Is there a general method
> to
> create data frames that contain the data in the desired format:
> numbers as
> numeric and character as a factor etc?  I often have this problem and
> in the
> worst case I have to export the file and read it back it in.    I have
> emulated a simple example of the problem.  It often happens while using
> "for" loops.  Could someone explain how to avoid this problem by
> properly
> creating data frames in for loops that can contain both numeric and
> character data.
> 
> 
> 
> ********Question for example 1.
> 
> Why does the cbind command convert the numeric data to character data?
> Why
> can't the character data be converted to numeric data using the fix
> command?
> 
> 
> ### Example 1  #############
> 
> data(iris)
> 
> obsnum<-NULL
> 
> results<-NULL
> 
> for(s in unique(as.character(iris$Species))){
> 
> temp1<-iris[iris$Species==s,]
> 
> obsnum<-length(unique(temp1$Sepal.Length))  # a number
> 
> out1<-cbind(species=as.character(paste(s)),obsnum)  # number converted
> to
> character
> 
> results<-rbind(out1,results)
> 
> }
> 
> results
> 
> #fix(results)  # cannot convert obsnum to numeric using fix
> 
> ####################################
> 
> 
> 
> ******Question for example 2
> 
> Why does adding the data.frame command allow the character data to be
> converted to numeric data using fix command?
> 
> ### Example 2  #############
> 
> data(iris)
> 
> obsnum<-NULL
> 
> results<-NULL
> 
> for(s in unique(as.character(iris$Species))){
> 
> temp1<-iris[iris$Species==s,]
> 
> obsnum<-length(unique(temp1$Sepal.Length))
> 
> out1<-data.frame(cbind(species=as.character(paste(s)),obsnum)) # number
> converted to character
> 
> results<-rbind(out1,results)
> 
> }
> 
> results
> 
> #fix(results)  # can now convert obsnum to numeric using fix
> 
> 
> 
> ######
> 
> 
> 
> 
> 
> Thank you,
> 
> Alan Smith
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list