[R] read.table only reads part of file
Sarah Goslee
sarah.goslee at gmail.com
Sat Jul 30 03:00:40 CEST 2011
Hi Peter,
I'm not going to look at your large file on what for me is Friday evening, but
the usual cause of that kind of problem is a single or double quote in the text.
One way to diagnose the problem is to look at the rows in the text file itself
right around 25952 - there's always something there causing the problem.
I'd also look in R at the last row that was imported. Often you can
see the problem
there as well.
Sarah
On Fri, Jul 29, 2011 at 8:54 PM, Peter Langfelder
<peter.langfelder at gmail.com> wrote:
> Hi all,
>
> I encountered a problem when trying to read in an Illumina chip
> annotation file. The offending file is large, so I zipped it up and
> posted it at
>
> http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/tmp/ProbeInfo_Expression.txt.bz2
>
> Executing this:
>
> annot = read.table(bzfile("ProbeInfo_Expression.txt.bz2"),
> comment.char="", sep = "\t", fill = TRUE, header = TRUE);
>
> leads to
>
>> dim(annot)
> [1] 25952 28
>
> i.e. 25952 rows were read, but the file is some 48000 rows long.
>
> The file contains long text entries (up to several thousand
> characters) which appear to be the problem since stripping out those
> columns (outside of R) and re-reading gives he full 48k+ rows.
>
> My question is why is read.table stopping the read (without any
> warning or error)? Am I missing something in the documentation (read
> it but didn't find anything). Any arguments I'm not setting right? I
> tried to google the problem but came up empty-handed.
>
> Session info:
>
>> sessionInfo()
> R version 2.11.1 Patched (2010-06-06 r52218)
> i686-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
> Thanks,
>
> Peter
>
> ____
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list