[R] Extract from a text file

Wed Jun 1 03:26:31 CEST 2016

Thank you so much Jeff. It worked for this example.

When I read it from a file (c:\data\test.txt) it did not work

KLEM="c:\data"
KR=paste(KLEM,"\test.txt",sep="")
indta <- readLines(KR, skip=46)  # not interested in the first 46 lines)

pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
firstlines <- grep( pattern, indta )
# Replace the matched portion (entire string) with the first capture # string
v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
# Replace the matched portion (entire string) with the second capture # string
v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
# Convert the lines just after the first lines to numeric
v3 <- as.numeric( indta[ firstlines + 1 ] )
# put it all into a data frame
result <- data.frame( Group = v1, Mean = v2, SE = v3 )

result
[1] Group Mean  SE
<0 rows> (or 0-length row.names)

Thank you in advance

On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Please learn to post in plain text (the setting is in your email client...
> somewhere), as HTML is "What We See Is Not What You Saw" on this mailing
> list.  In conjunction with that, try reading some of the fine material
> mentioned in the Posting Guide about making reproducible examples like this
> one:
>
> # You could read in a file
> # indta <- readLines( "out.txt" )
> # but there is no "current directory" in an email
> # so here I have used the dput() function to make source code
> # that creates a self-contained R object
>
> indta <- c(
> "Mean of weight  group 1, SE of mean  :  72.289037489555276",
> " 11.512956539215610",
> "Average weight of group 2, SE of Mean :  83.940053900595013",
> "  10.198495690144522",
> "group 3 mean , SE of Mean     :                78.310441258245469",
> " 13.015876679555",
> "Mean of weight of group 4, SE of Mean               : 76.967516495101669",
> " 12.1254882985", "")
>
> # Regular expression patterns are discussed all over the internet
> # in many places OTHER than R
> # You can start with ?regex, but there are many fine tutorials also
>
> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
> # For this task the regex has to match the whole "first line" of each set
> #  ^ =match starting at the beginning of the string
> #  .* =any character, zero or more times
> #  "group " =match these characters
> #  ( =first capture string starts here
> #  \\d = any digit (first backslash for R, second backslash for regex)
> #  + =one or more of the preceding (any digit)
> #  ) =end of first capture string
> #  [^:] =any non-colon character
> #  * =zero or more of the preceding (non-colon character)
> #  : =match a colon exactly
> #  " *" =match zero or more spaces
> #  ( =second capture string starts here
> #  [ =start of a set of equally acceptable characters
> #  -+ =either of these characters are acceptable
> #  0-9 =any digit would be acceptable
> #  . =a period is acceptable (this is inside the [])
> #  eE =in case you get exponential notation input
> #  ] =end of the set of acceptable characters (number)
> #  * =number of acceptable characters can be zero or more
> #  ) =second capture string stops here
> #  .* =zero or more of any character (just in case)
> #  $ =at end of pattern, requires that the match reach the end
> #     of the string
>
> # identify indexes of strings that match the pattern
> firstlines <- grep( pattern, indta )
> # Replace the matched portion (entire string) with the first capture #
> string
> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
> # Replace the matched portion (entire string) with the second capture #
> string
> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
> # Convert the lines just after the first lines to numeric
> v3 <- as.numeric( indta[ firstlines + 1 ] )
> # put it all into a data frame
> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>
> Figuring out how to deliver your result (output) is a separate question that
> depends where you want it to go.
>
>
> On Mon, 30 May 2016, Val wrote:
>
>> Hi all,
>>
>> I have a messy text file and from this text file I want extract some
>> information
>> here is the text file (out.txt).  One record has tow lines. The mean comes
>> in the first line and the SE of the mean is on the second line. Here is
>> the
>> sample of the data.
>>
>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>> 11.512956539215610
>> Average weight of group 2, SE of Mean :  83.940053900595013
>>  10.198495690144522
>> group 3 mean , SE of Mean     :                78.310441258245469
>> 13.015876679555
>> Mean of weight of group 4, SE of Mean               : 76.967516495101669
>> 12.1254882985
>>
>> I want produce the following  table. How do i read it first and then
>> produce a
>>
>>
>> Gr1  72.289037489555276   11.512956539215610
>> Gr2  83.940053900595013   10.198495690144522
>> Gr3  78.310441258245469   13.015876679555
>> Gr4  76.967516495101669   12.1254882985
>>
>>
>> Thank you in advance
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------