[R] Importing Large Dataset into Excel
David Scott
d.scott at auckland.ac.nz
Wed Dec 12 12:00:53 CET 2007
On Wed, 12 Dec 2007, Peter Dalgaard wrote:
> Philippe Grosjean wrote:
> The problem is often a misspecification of the comment.char argument.
> For read.table(), it defaults to '#'. This means that everywhere you
> have a '#' char in your Excel sheet, the rest of the line is ignored.
> This results in a different number of items per line.
>
> You should better use read.csv() which provides better default arguments
> for your particular problem.
> Best,
>
>
Or read.delim/read.delim2, which should be even better at TAB-separated
files.
In general, be very suspicious of read.table() with such files, not only
because of the '#' but also because it expects columns separated by
_arbitrary_ amounts of whitespace. I.e., n TABs counts as one, so empty
fields are skipped over.
******* End of other contributions (not sure why my mailer didn't mark
them)
I would also say be very suspicious of Excel writing .csv files.
I found by looking at the .csv file in an editor that for some reason when
there were empty fields in the original .xls file that for some records,
Excel didn't add in enough commas to make up the correct number of fields.
It did for some records but not for others. Excel truly works in
misterious ways.
read.csv has an argument fill which should fix this problem. In my case I
was actually reading the .csv file into mySQL and the solution was to
select the whole of the .xls file and format it as text before writing the
.csv file.
David SCott
_________________________________________________________________
David Scott Department of Statistics, Tamaki Campus
The University of Auckland, PB 92019
Auckland 1142, NEW ZEALAND
Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000
Email: d.scott at auckland.ac.nz
Graduate Officer, Department of Statistics
Director of Consulting, Department of Statistics
More information about the R-help
mailing list