[R] Odp: Problem using read.xls - Everything converted to factors

Petr PIKAL petr.pikal at precheza.cz
Fri Jun 3 17:34:49 CEST 2011


Hi

> 
> [R] Problem using read.xls - Everything converted to factors
> 
> Hallo,
> 
> I would like to use to read.xls function from the gdata package to read 
> data from Microsoft Excel files but I experienced a problem: For example 

> I used the following code:
> 
> testfile<-read.xls("/home/.../wsjecon0603.xls", #file path
>             header=F,
>             dec=",",
>             na.strings="n.a.",
>             skip=5,
>             sheet=2,
>             col.names=c("Name", 
"Firm","GDP1","GDP2","GDP3","GDP4","CPI5",
> 
> "CPI11","UNEMP5","UNEMP11","PROF03","PROF04","STARTS03","STARTS04"),
>             nrows=54,
> 
> #colClasses=c
> 
(character,character,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric)
> 
> )
> print(testfile)
> 
> Although the xls file contains numeric values in all the columns except 
> the ones which I named "Name" and "Firm", everything in the data frame 
> has "factor" as class. I tried to use the colClasses option as above and 

> as well with " "'s around each word, but this does not work and I will 

Hm. That shall work. You have got some advice from Gabor but in case 
numeric columns come as non numeric I often find a problem with some kind 
of formating the original values. Numbers like 10 253,52 are treated as 
nonnumeric as there is extra space character between thousands and 
hundereds. Maybe also na.strings are not always marked as n.a. but 
sometimes the value is missing and I suppose this can lead to conversion 
of all column to character vector.

> always receive the following error:
> 
> Fehler in is(object, Class) :
>    versuche einen Slot "className" von einem Objekt der einfachen Klasse 

> ("list") ohne Slots anzufordern
> Calls: read.xls -> read.csv -> read.table -> <Anonymous> -> is
> 
> After some hours of reasearch I figured out how I can manually change 
> the classes of the columns:
> 
> testfile$GDP2<-as.numeric(levels(testfile$GDP2))[testfile$GDP2]
> testfile$Name<-as.character(levels(testfile$Name))[testfile$Name] #and 
so on

you can spare some time to use sapply

testfile[,character columns] <- sapply(testfile[,character columns], 
as.numeric)

shall convert all character columns to numeric at once but you will get 
NAs to all values which could not be converted for any reason.

Regards
Petr


> 
> This works, but is a lot of work since I have to import many different 
> data sets. So I was wondering if there is another way to let the classes 

> be recognized correctly.
> 
> Additionally I would like to know if there is any way to import data 
> from different sheets with the same layout at once into one data frame.
> 
> I use Ubuntu 11.04 with Rkward if this is of any importance.
> 
> Thanks in advance for your answers,
> Sebastian
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list