[R] Parsing question, partly comma separated partly underscore separated string

Don McKenzie dmck at u.washington.edu
Mon Mar 7 05:39:01 CET 2011


On 6-Mar-11, at 7:13 PM, Eric Fail wrote:

> Dear R-list,
>
> I have a partly comma separated partly underscore separated string  
> that I am trying to parse into R.
>
> Furthermore I have a bunch of them, and they are quite long. I have  
> now spent most of my Sunday trying to figure this out and thought I  
> would try the list to see if someone here would be able to get me  
> started.
>
> My data structure looks like this,
>
> (in a example.txt file)
> Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by  
> 960  pixels, On Device M, M, 3.2.4,ZZ_373_462_488_TRT at 9z.svg, 
> 592,820,3.35,ZZ_032_288_436_CON at 9z.svg, 
> 332,878,3.66,ZZ_384_204_433_TRT at 9z.svg, 
> 334,824,3.28,ZZ_365_575_683_TRT at 9z.svg, 
> 598,878,3.50,ZZ_005_480_239_CON at 9z.svg, 
> 630,856,8.03,ZZ_030_423_394_CON at 9z.svg, 
> 98,846,4.09,ZZ_033_596_398_CON at 9z.svg, 
> 636,902,3.28,ZZ_263_064_320_TRT at 9z.svg,570,894,1.26,BLOCK at 9z.svg, 
> 322,842,32.96,ZZ_004_088_403_CON at 9z.svg, 
> 606,908,3.32,ZZ_703_546_434_CON at 9z.svg, 
> 624,934,2.58,ZZ_712_348_543_CON at 9z.svg, 
> 20,828,5.36,ZZ_005_48_239_CON at 9z.svg, 
> 580,830,4.36,ZZ_310_444_623_TRT at 9z.svg, 
> 586,806,0.08,ZZ_030_423_394_CON at 9z.svg, 
> 350,854,3.84,ZZ_340_382_539_TRT at 9z.svg,570,894,1.26,BLOCK at 9z.svg, 
> 542,840,4.44,ZZ_345_230_662_TRT at 9z.svg, 
> 632,844,2.47,ZZ_006_335_309_CON at 9z.svg, 
> 96,930,3.63,ZZ_782_346_746_TRT at 9z.svg, 
> 306,850,2.58,ZZ_334_200_333_TRT at 9z.svg, 
> 304,842,3.34,ZZ_383_506_726_TRT at 9z.svg, 
> 622,884,3.84,ZZ_294_360_448_TRT at 9z.svg, 
> 90,858,3.56,ZZ_334_335_473_TRT at 9z.svg,570,894,1.26,BLOCK at 9z.svg, 
> 320,852,4.04,
> (end of example.txt file)
>
> The above is approximate 5% of the length of a full file, and then  
> I got about 100 of them. Please note that the strings end with a  
> comma.
>
> I am trying to parse it into something like this
>
> ID ImgNam BLOCK RUN Tx Ty Treatment x y Y
> Subject ID 373 1 1 462 488 TRT 592 820 3.35
> Subject ID 32 1 2 288 436 CON 332 878 3.66
> Subject ID 384 1 3 204 433 TRT 334 824 3.28
> Subject ID 365 1 4 575 683 TRT 598 878 3.5
> Subject ID 5 1 5 480 239 CON 630 856 8.03
> Subject ID 30 1 6 423 394 CON 98 846 4.09
> Subject ID 33 1 7 596 398 CON 636 902 3.28
> Subject ID 263 1 8 64 320 TRT 570 894 1.26
> Subject ID 4 2 1 88 403 CON 606 908 3.32
> Subject ID 703 2 2 546 434 CON 624 934 2.58
> Subject ID 712 2 3 348 543 CON 20 828 5.36
> Subject ID 5 2 4 48 239 CON 580 830 4.36
> Subject ID 310 2 5 444 623 TRT 586 806 0.08
> Subject ID 30 2 6 423 394 CON 350 854 3.84
> Subject ID 340 2 7 382 539 TRT 570 894 1.26
> Subject ID 345 3 1 230 662 TRT 632 844 2.47
> Subject ID 6 3 2 335 309 CON 96 930 3.63
> Subject ID 782 3 3 346 746 TRT 306 850 2.58
> Subject ID 334 3 4 200 333 TRT 304 842 3.34
> Subject ID 383 3 5 506 726 TRT 622 884 3.84
> Subject ID 294 3 6 360 448 TRT 90 858 3.56
> Subject ID 334 3 7 335 473 TRT 570 894 1.26
>
> I could do it in Excel, but it would take me a week--and it would  
> be stupid--if someone could please help me get started I would very  
> much appreciate it. It would not only benefit me, but my colleagues  
> would see the benefit of R and the R-list in particular.
>
> Thanks in advance!
>
> Eric
>

In a good text editor it would be one command per file.  So if you  
are on UNIX or mac OSX you could loop through files with (probably)  
an awk
command.  I don't remember the syntax (it's been too long) but it  
should be just a few lines of shell script.  In windows I'm not sure  
but there should
be something similar.

Maybe that "gets you started".  Probably one of the list jocks will  
have it nailed if you wait.
> --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Why does the universe go to all the bother of existing?
-- Stephen Hawking

#define QUESTION ((bb) || !(bb))
-- William Shakespeare



Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
dmck at uw.edu
donaldmckenzie at fs.fed.us



More information about the R-help mailing list