[R] big data file geting truncated
Philipp Pagel
p.pagel at gsf.de
Wed Aug 13 10:33:10 CEST 2003
Hi!
> I used the following commands
> mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
> ="\t",row.names=NULL)
> It reads data without any error
> Now if I use
> edit(mydata)
> It shows only 3916 entries, whereas the actual file contains 7129 entries)
[...]
> So it seems R is truncating the data. How can I load the complete file?
Others have already recommended checking the length of the data.frame
using dim() and the file using wc. If it turns out that there really is
a difference in size the next thing would be to get an idea what lines
are affected: Are "random" lines missing or is everything ok up to line
3916 and then it stops? In either case - have a close look at the lines
missing or the last line present plus the first one missing: Is there
anything special about them?
But actually I have a feeling that this may be your problem:
read.table uses both '"' and "'" for quoting by default. Gene
descriptions love to contain things like "5'" and "3'".
=> Try quote='' in the read.table call.
cu
Philipp
--
Dr. Philipp Pagel Tel. +49-89-3187-3675
Institute for Bioinformatics / MIPS Fax. +49-89-3187-3585
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1
85764 Neuherberg, Germany
More information about the R-help
mailing list