[R] expected behavior when parsing lines with special characters
jim holtman
jholtman at gmail.com
Tue Feb 15 18:28:39 CET 2011
Check out the arguments for read.table especially 'quote'
you probably want quote='' to suppress the special meaning of quote.
You might also need comment.char in the future.
On Tue, Feb 15, 2011 at 12:21 PM, Robert M. Flight <rflight79 at gmail.com> wrote:
> Say I have a tab-delimited table I want to read into R. What should I
> expect to happen if some of the entries contain the character " ' "? I
> thought it would read the file fine, but that is not what happens.
> Instead, all the values in between two " ' "s get read into one field,
> and things are just seriously messed up. Is this a bug, and besides
> removing the offending characters, is there a fix?
>
> Example Input file:
>
> testFile.txt:
> 3499 9031 424823 COP'B2 118094989 XP_422637.2
> 3499 7955 114454 copb2 50080158 NP_001001940.1
> 3499 7227 45757 betaCop 24584107 NP_524836.2
> 3499 7165 1278426 AgaP_AGAP004798 158297839 XP_318012.4
> 3499 6239 177779 F38E11.5 17540286 NP_501671.1
> 3499 4896 2540050 sec'27 19113604 NP_596811.1
> 3499 4932 852740 SEC27 6321301 NP_011378.1
> 3499 28985 2897447 KLLA0B01958g 50303353 XP_451618.1
> 3499 33169 4621659 AGOS_AFL118W 45198403 NP_985432.1
> 3499 148305 2682116 MGG_10504 145615762 XP_366285.2
> 3499 5141 2709504 NCU07319.1 32414251 XP_327605.1
> 3499 3702 820842 AT3G15980 30683862 NP_850592.1
> 3499 3702 841666 AT1G52360 15218215 NP_175645.1
> 3499 3702 844339 AT1G79990 30699476 NP_178116.2
> 3499 4530 4340097 Os06g0143900 115466360 NP_001056779.1
>
> testDat <- read.table('testFile.txt',sep='\t')
> testDat
>
> V1 V2 V3
> 1 3499 9031 424823
> 2 3499 4932 852740
> 3 3499 28985 2897447
> 4 3499 33169 4621659
> 5 3499 148305 2682116
> 6 3499 5141 2709504
> 7 3499 3702 820842
> 8 3499 3702 841666
> 9 3499 3702 844339
> 10 3499 4530 4340097
>
>
>
> V4
> 1 COPB2\t118094989\tXP_422637.2\n3499\t7955\t114454\tcopb2\t50080158\tNP_001001940.1\n3499\t7227\t45757\tbetaCop\t24584107\tNP_524836.2\n3499\t7165\t1278426\tAgaP_AGAP004798\t158297839\tXP_318012.4\n3499\t6239\t177779\tF38E11.5\t17540286\tNP_501671.1\n3499\t4896\t2540050\tsec27
> 2
>
>
> SEC27
> 3
>
>
> KLLA0B01958g
> 4
>
>
> AGOS_AFL118W
> 5
>
>
> MGG_10504
> 6
>
>
> NCU07319.1
> 7
>
>
> AT3G15980
> 8
>
>
> AT1G52360
> 9
>
>
> AT1G79990
> 10
>
>
> Os06g0143900
> V5 V6
> 1 19113604 NP_596811.1
> 2 6321301 NP_011378.1
> 3 50303353 XP_451618.1
> 4 45198403 NP_985432.1
> 5 145615762 XP_366285.2
> 6 32414251 XP_327605.1
> 7 30683862 NP_850592.1
> 8 15218215 NP_175645.1
> 9 30699476 NP_178116.2
> 10 115466360 NP_001056779.1
>
> I would appreciate any feedback.
>
> Thanks,
>
> -Robert
>
>> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.1
>
>
> Robert M. Flight, Ph.D.
> University of Louisville Bioinformatics Laboratory
> University of Louisville
> Louisville, KY
>
> PH 502-852-1809 (HSC)
> PH 502-852-0467 (Belknap)
> EM robert.flight at louisville.edu
> EM rflight79 at gmail.com
>
> Williams and Holland's Law:
> If enough data is collected, anything may be proven by
> statistical methods.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list