[R] Unexpected behaviour from read.table

Michael michael77allen at gmail.com
Sun Feb 4 23:45:07 CET 2018


I’ve been struggling with seemingly ‘corrupt’ data.frames for a few days, and believe I’ve narrowed the problem down to some odd behaviour from read.table

I receive a tab delimited file from an external provider where strings are encoded as =“content”. Not sure why, perhaps as most users open it in Excel. 
My specific issue is that trailing spaces in any of the strings are causing strange results from read.table

# No trailing spaces
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t’)
      V1    V2
1     ID Value
2 =Total  1000
3  =CJ01   550
4  =CF02   450

# Now with trailing spaces in line 3
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01   \"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t')
        V1    V2
1    =CF02   450
2       ID Value
3   =Total  1000
4 =CJ01      550
5    =CF02   450

I solved my specific problem by setting quote=‘’, and extracting the string content after calling read.table. As my original code had header=TRUE, I was finding random rows were being used as column names! 

Flagging a potential issue with read.table, although I can easily accept I'm missing something obvious here. 

Best,
 Michael

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)  / x86_64-pc-linux-gnu (64-bit)
Running under: macOS High Sierra 10.13.2 /  Ubuntu 16.04.3 LTS







	[[alternative HTML version deleted]]



More information about the R-help mailing list