[R] Importing Large Dataset into Excel

Wed Dec 12 12:00:53 CET 2007

On Wed, 12 Dec 2007, Peter Dalgaard wrote:

> Philippe Grosjean wrote:
> The problem is often a misspecification of the comment.char argument. 
> For read.table(), it defaults to '#'. This means that everywhere you 
> have a '#' char in your Excel sheet, the rest of the line is ignored. 
> This results in a different number of items per line.
>
> You should better use read.csv() which provides better default arguments 
> for your particular problem.
> Best,
>
> 
Or read.delim/read.delim2, which should be even better at TAB-separated
files.

In general, be very suspicious of read.table() with such files, not only
because of the '#' but also because it expects columns separated by
_arbitrary_ amounts of whitespace. I.e., n TABs  counts as one, so empty
fields are skipped over.

******* End of other contributions (not sure why my mailer didn't mark 
them)

I would also say be very suspicious of Excel writing .csv files.
I found by looking at the .csv file in an editor that for some reason when 
there were empty fields in the original .xls file that for some records, 
Excel didn't add in enough commas to make up the correct number of fields.
It did for some records but not for others. Excel truly works in 
misterious ways.

read.csv has an argument fill which should fix this problem. In my case I 
was actually reading the .csv file into mySQL and the solution was to 
select the whole of the .xls file and format it as text before writing the 
.csv file.

David SCott

_________________________________________________________________
David Scott	Department of Statistics, Tamaki Campus
 		The University of Auckland, PB 92019
 		Auckland 1142,    NEW ZEALAND
Phone: +64 9 373 7599 ext 86830		Fax: +64 9 373 7000
Email:	d.scott at auckland.ac.nz

Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics