[R] Large Data Set Help

jim holtman jholtman at gmail.com
Mon Aug 25 22:13:26 CEST 2008


Establish a "connection" with the file you want to read, read in 1,000
rows (or whatever you want).  If you are using read.csv and there is a
header, you might want to skip it initially since there will be no
header when you read the next 1000 rows.  Also put 'as.is=TRUE" so
that character fields are not converted to factors.  You can then
write out the columns that you want.  You can put this in a loop till
you reach the end of file.

On Mon, Aug 25, 2008 at 3:34 PM, Jason Thibodeau <jbloudg20 at gmail.com> wrote:
> I am attempting to perform some simple data manipulation on a large data
> set. I have a snippet of the whole data set, and my small snippet is 2GB in
> CSV.
>
> Is there a way I can read my csv, select a few columns, and write it to an
> output file in real time? This is what I do right now to a small test file:
>
> data <- read.csv('data.csv', header = FALSE)
>
> data_filter <- data[c(1,3,4)]
>
> write.table(data_filter, file = "filter_data.csv", sep = ",", row.names =
> FALSE, col.names = FALSE)
>
> This test file writes the three columns to my desired output file. Can I do
> this while bypassing the storage of the entire array in memory?
>
> Thank you very much for the help.
> --
> Jason
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list