[R] Reading in csv data with ff package

Jan van der Laan rhelp at eoos.dds.nl
Tue Nov 19 10:12:24 CET 2013


The following seems to work:

data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
   next.rows = 1005,sep=",",colClasses = c("integer","factor","logical"))


'character' doesn't work because ff does not support character  
vectors. Character vector need to be stored as factors. The  
disadvantage of that is that the levels are stored in memory, so if  
the number of levels is very large (e.g. with unique strings) you  
might still run into memory problems.

'integer' doesn't work because read.csv.ffdf passes the colClasses on  
to read.table, which then tries to converts your second column to  
integer which it can't.

Jan



Nick McClure <nfmcclure at gmail.com> schreef:

> I've spent some time trying to wrap my head around reading in large csv
> files with the ff-package.  I think I know how to do it, but am bumping
> into some problems.  I've tried to recreate the issues as best as I can
> with a smaller example and maybe someone can help explain the problems.
>
> The following code just creates a csv file with an integer column,
> character column and logical column.
> -------------------------------------------------
> library(ff)
> #Create data
> size = 2000
> fake.data =
> data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T))
>
> #Write to csv
> write.csv(fake.data,"data.csv",row.names=F)
> -------------------------------------------------
>
> Now to read it in as a 'ffdf' class, I can do the following:
>
> -------------------------------------------------
> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",")
> -------------------------------------------------
>
> That works.  But with my current large data set, read.csv.ffdf is debating
> with me about the classes it's importing. I was also messing around with
> the first.rows/next.rows, but that's a question for another time. So I'll
> try to load the data in, specifying the column types (same exact command,
> except with specifying colClasses):
>
> -------------------------------------------------
>
>> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =  
>> 500, next.rows = 1005,sep=",",colClasses =  
>> c("integer","integer","logical"))Error in scan(file, what, nmax,  
>> sep, dec, quote, skip, nlines, na.strings,  :
>   scan() expected 'an integer', got '"J"'> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses =
> c("integer","character","logical"))Error in ff(initdata = initdata,
> length = length, levels = levels, ordered = ordered,  :
>   vmode 'character' not implemented> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses = rep("character",3))Error in
> ff(initdata = initdata, length = length, levels = levels, ordered =
> ordered,  :
>   vmode 'character' not implemented> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses = rep("raw",3))Error in scan(file,
> what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   scan() expected 'a raw', got '8601'
>
> -------------------------------------------------
> I just can't find a combination of classes that will result in this reading
> in.  I really don't understand why the classes 'character' won't work for
> all of them.  Any thoughts as to why?  I appreciate the help and time.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list