[R] parsing a complex file

jim holtman jholtman at gmail.com
Sun Aug 28 04:06:48 CEST 2016


It is not clear as to how you want to parse the file.  You need to at least
provide an example of what you expect from the output.  You mention " the
detail which begins with 2 at byte location 1 to another file"; I don't see
the '2' at byte location 1.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Aug 27, 2016 at 4:56 PM, Glenn Schultz <glennmschultz at me.com> wrote:

> All,
>
> I have a complex file I would like to parse in R a sample is described
> below
>
> The header is 1:200 and the detail is 1 to 200.  I have written code to
> parse the file so far.  As follows:
>
> numchar <- nchar(x = data, type = "chars")
> start <- c(seq(1, numchar, 398))
> end <- c(seq(398, numchar, 398))
> quartile <- NULL
> final <- str_sub(data, start[1:length(start)], end[1:length(end)])
> quartile <- append(quartile, final)
> write(quartile, Result)
> data2 <- readLines(Result)
>
> The function gets me to data2.  All is well so far. However, I need to
> send the header which begins with 1 at byte location 1 to a file and the
> detail which begins with 2 at byte location 1 to another file.  When I look
> at data2 in RStudio  I see the following.  The file is 185 meg, I have the
> lines but I am stuck as to the next step.  Any ideas are appreciated.
>
> Glenn
>
>
> dput of the data
>
> "1176552 CL20031031367RBV319920901
>
>
>  217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2
> 2D13C13C13C13C13C13C0000604000{0000604000{0000604000{
> 0000604000{0000604000{0000604000{36{36{36{36{36{36{
> 08500{08500{08500{08500{08500{08500{1254240 CL20031031371KLV120020201
>
>
>          225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{
> 34A02A01I02{02{02A03B0001121957C0000123500{0000920000{
> 0001280000{0001741000{0003849000{35I30{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{1254253 CL20031031371KMA620020301
>
>
>          225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{
> 34A02{01I02{02{02A02C0000946646A0000350000{0000850000{
> 0001030000{0001205000{0001300000{35H30{36{36{36{36{
> 06000{06000{06000{06000{06000{06000{1259455 CL20031031371RE4420020501
>
>
>          225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3
> 4C01H01G01H01H01H02C0000934444E0000360000{0000765000{
> 0000995000{0001384000{0002184000{35I30{36{36{36{36{
> 06500{06500{06500{06500{06500{06500{1261060 CI20031031371S5V219940101
>
>
>          226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0
> 6B11H11G11G11H11H11I0001169090I0000650000{0000950000{
> 0001250000{0001328000{0001900000{18{18{18{18{18{18{
> 06000{06000{06000{06000{06000{06000{1335271 CI20031031375HMU519960101
>
>
>          233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0
> 8F09D09D09D09D09E09E0000717375{0000464000{0000550000{
> 0000770000{0001085500{0001085500{18{18{18{18{18{18{
> 07000{07000{07000{07000{07000{07000{1440840 CL20031031380HV9519981101
>
>
>          244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{
> 30A06{05I06{06{06{06A0000615172I0000250000{00006
> 21000{0000673000{0000750000{0000791000{36{36{36{36{36{36{
> 06000{06000{06000{06000{06000{06000{1521993 CI20031031384E3A620000101
>
>
>          252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1
> 3E04E04E04E04E04F04F0001129428F0000700000{0000955000{0001000
> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{
> 06500{06500{06500{06500{1538080 CL20031031384YXH420000501
>
>
>  253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3
> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{
> 0001419300{0001419300{0001419300{36{36{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{1659123 CI20031031390XG8720020801
>
>
>          265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1
> 6F01E01D01D01E01E01G0000998541G0000162000{0000792000{
> 0001156500{0001600000{0001990000{18{18{18{18{18{18{
> 06000{06000{06000{06000{06000{06000{"
>
>
> dput data2
> c("1176552 CL20031031367RBV319920901 217655208875{08875{08875{08875
> {08875{08875{22D22D22D22D22D22D13C13C13C13C13C13C0000604000{
> 0000604000{0000604000{0000604000{0000604000{0000604000{36{36{36{36{36{36{
> 08500{08500{08500{08500{08500{08500{", "1254240 CL20031031371KLV120020201
> 225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{
> 34A02A01I02{02{02A03B0001121957C0000123500{0000920000{
> 0001280000{0001741000{0003849000{35I30{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{", "1254253 CL20031031371KMA620020301
> 225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{
> 34A02{01I02{02{02A02C0000946646A0000350000{0000850000{
> 0001030000{0001205000{0001300000{35H30{36{36{36{36{
> 06000{06000{06000{06000{06000{06000{", "1259455 CL20031031371RE4420020501
> 225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B34
> C01H01G01H01H01H02C0000934444E0000360000{0000765000{0000995000{0001384000{
> 0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{06500{",
> "1261060 CI20031031371S5V219940101 226106006637{06500{06500{06625
> {06750{06875{05B00C04H05I06B06B11H11G11G11H11H11I0001169090I
> 0000650000{0000950000{0001250000{0001328000{0001900000{18{18{18{18{18{18{
> 06000{06000{06000{06000{06000{06000{", "1335271 CI20031031375HMU519960101
> 233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F08
> F09D09D09D09D09E09E0000717375{0000464000{0000550000{0000770000{0001085500{
> 0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{07000{",
> "1440840 CL20031031380HV9519981101 244084006707{06500{06625{06750
> {06875{06875{27D03C28C29H30{30A06{05I06{06{06{
> 06A0000615172I0000250000{0000621000{0000673000{0000750000{
> 0000791000{36{36{36{36{36{36{06000{06000{06000{06000{06000{06000{",
> "1521993 CI20031031384E3A620000101 252199306937{06875{06875{06875
> {07000{07000{12H02H12H13{13D13E04E04E04E04E04F04F0001129428F
> 0000700000{0000955000{0001000000{0002087000{0002087000{18{
> 18{18{18{18{18{06500{06500{06500{06500{06500{06500{", "1538080
> CL20031031384YXH420000501 253808008875{08875{08875{08875
> {08875{08875{31I31I31I31I31I31I04A04A04A04A04A04A0001419300{
> 0001419300{0001419300{0001419300{0001419300{0001419300{36{36{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{", "1659123 CI20031031390XG8720020801
> 265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F16
> F01E01D01D01E01E01G0000998541G0000162000{0000792000{0001156500{0001600000{
> 0001990000{18{18{18{18{18{18{06000{06000{06000{06000{06000{06000{"
> )
>
> Data 2
>
>  [1] "1176552 CL20031031367RBV319920901 217655208875{08875{08875{08875
> {08875{08875{22D22D22D22D22D22D13C13C13C13C13C13C0000604000{
> 0000604000{0000604000{0000604000{0000604000{0000604000{36{36{36{36{36{36{
> 08500{08500{08500{08500{08500{08500{"
> [2] "1254240 CL20031031371KLV120020201 225424007484{07250{07375{07500
> {07625{08625{33F06H33H33I34{34A02A01I02{02{02A03B000112195
> 7C0000123500{0000920000{0001280000{0001741000{
> 0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{07000{"
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list