[R] Unexpected behaviour from read.table
Michael
michael77allen at gmail.com
Sun Feb 4 23:45:07 CET 2018
I’ve been struggling with seemingly ‘corrupt’ data.frames for a few days, and believe I’ve narrowed the problem down to some odd behaviour from read.table
I receive a tab delimited file from an external provider where strings are encoded as =“content”. Not sure why, perhaps as most users open it in Excel.
My specific issue is that trailing spaces in any of the strings are causing strange results from read.table
# No trailing spaces
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t’)
V1 V2
1 ID Value
2 =Total 1000
3 =CJ01 550
4 =CF02 450
# Now with trailing spaces in line 3
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01 \"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t')
V1 V2
1 =CF02 450
2 ID Value
3 =Total 1000
4 =CJ01 550
5 =CF02 450
I solved my specific problem by setting quote=‘’, and extracting the string content after calling read.table. As my original code had header=TRUE, I was finding random rows were being used as column names!
Flagging a potential issue with read.table, although I can easily accept I'm missing something obvious here.
Best,
Michael
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit) / x86_64-pc-linux-gnu (64-bit)
Running under: macOS High Sierra 10.13.2 / Ubuntu 16.04.3 LTS
[[alternative HTML version deleted]]
More information about the R-help
mailing list