[R] read.table only reads part of file

Sarah Goslee sarah.goslee at gmail.com
Sat Jul 30 03:00:40 CEST 2011


Hi Peter,

I'm not going to look at your large file on what for me is Friday evening, but
the usual cause of that kind of problem is a single or double quote in the text.

One way to diagnose the problem is to look at the rows in the text file itself
right around 25952 - there's always something there causing the problem.
I'd also look in R at the last row that was imported. Often you can
see the problem
there as well.

Sarah

On Fri, Jul 29, 2011 at 8:54 PM, Peter Langfelder
<peter.langfelder at gmail.com> wrote:
> Hi all,
>
> I encountered a problem when trying to read in an Illumina chip
> annotation file. The offending file is large, so I zipped it up and
> posted it at
>
> http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/tmp/ProbeInfo_Expression.txt.bz2
>
> Executing this:
>
> annot = read.table(bzfile("ProbeInfo_Expression.txt.bz2"),
>                comment.char="",  sep = "\t", fill = TRUE, header = TRUE);
>
> leads to
>
>> dim(annot)
> [1] 25952    28
>
> i.e. 25952 rows were read, but the file is some 48000 rows long.
>
> The file contains long text entries (up to several thousand
> characters) which appear to be the problem since stripping out those
> columns (outside of R) and re-reading gives he full 48k+ rows.
>
> My question is why is read.table stopping the read (without any
> warning or error)? Am I missing something in the documentation (read
> it but didn't find anything). Any arguments I'm not setting right? I
> tried to google the problem but came up empty-handed.
>
> Session info:
>
>> sessionInfo()
> R version 2.11.1 Patched (2010-06-06 r52218)
> i686-pc-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
>  [7] LC_PAPER=en_US.utf8       LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> Thanks,
>
> Peter
>
> ____



-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list