[R] Reading in csv data with ff package
Jan van der Laan
rhelp at eoos.dds.nl
Tue Nov 19 10:12:24 CET 2013
The following seems to work:
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = c("integer","factor","logical"))
'character' doesn't work because ff does not support character
vectors. Character vector need to be stored as factors. The
disadvantage of that is that the levels are stored in memory, so if
the number of levels is very large (e.g. with unique strings) you
might still run into memory problems.
'integer' doesn't work because read.csv.ffdf passes the colClasses on
to read.table, which then tries to converts your second column to
integer which it can't.
Jan
Nick McClure <nfmcclure at gmail.com> schreef:
> I've spent some time trying to wrap my head around reading in large csv
> files with the ff-package. I think I know how to do it, but am bumping
> into some problems. I've tried to recreate the issues as best as I can
> with a smaller example and maybe someone can help explain the problems.
>
> The following code just creates a csv file with an integer column,
> character column and logical column.
> -------------------------------------------------
> library(ff)
> #Create data
> size = 2000
> fake.data =
> data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T))
>
> #Write to csv
> write.csv(fake.data,"data.csv",row.names=F)
> -------------------------------------------------
>
> Now to read it in as a 'ffdf' class, I can do the following:
>
> -------------------------------------------------
> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",")
> -------------------------------------------------
>
> That works. But with my current large data set, read.csv.ffdf is debating
> with me about the classes it's importing. I was also messing around with
> the first.rows/next.rows, but that's a question for another time. So I'll
> try to load the data in, specifying the column types (same exact command,
> except with specifying colClasses):
>
> -------------------------------------------------
>
>> data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
>> 500, next.rows = 1005,sep=",",colClasses =
>> c("integer","integer","logical"))Error in scan(file, what, nmax,
>> sep, dec, quote, skip, nlines, na.strings, :
> scan() expected 'an integer', got '"J"'> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses =
> c("integer","character","logical"))Error in ff(initdata = initdata,
> length = length, levels = levels, ordered = ordered, :
> vmode 'character' not implemented> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses = rep("character",3))Error in
> ff(initdata = initdata, length = length, levels = levels, ordered =
> ordered, :
> vmode 'character' not implemented> data =
> read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
> next.rows = 1005,sep=",",colClasses = rep("raw",3))Error in scan(file,
> what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> scan() expected 'a raw', got '8601'
>
> -------------------------------------------------
> I just can't find a combination of classes that will result in this reading
> in. I really don't understand why the classes 'character' won't work for
> all of them. Any thoughts as to why? I appreciate the help and time.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list