[R] Extract from a text file
Val
valkremk at gmail.com
Wed Jun 1 03:26:31 CEST 2016
Thank you so much Jeff. It worked for this example.
When I read it from a file (c:\data\test.txt) it did not work
KLEM="c:\data"
KR=paste(KLEM,"\test.txt",sep="")
indta <- readLines(KR, skip=46) # not interested in the first 46 lines)
pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
firstlines <- grep( pattern, indta )
# Replace the matched portion (entire string) with the first capture # string
v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
# Replace the matched portion (entire string) with the second capture # string
v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
# Convert the lines just after the first lines to numeric
v3 <- as.numeric( indta[ firstlines + 1 ] )
# put it all into a data frame
result <- data.frame( Group = v1, Mean = v2, SE = v3 )
result
[1] Group Mean SE
<0 rows> (or 0-length row.names)
Thank you in advance
On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Please learn to post in plain text (the setting is in your email client...
> somewhere), as HTML is "What We See Is Not What You Saw" on this mailing
> list. In conjunction with that, try reading some of the fine material
> mentioned in the Posting Guide about making reproducible examples like this
> one:
>
> # You could read in a file
> # indta <- readLines( "out.txt" )
> # but there is no "current directory" in an email
> # so here I have used the dput() function to make source code
> # that creates a self-contained R object
>
> indta <- c(
> "Mean of weight group 1, SE of mean : 72.289037489555276",
> " 11.512956539215610",
> "Average weight of group 2, SE of Mean : 83.940053900595013",
> " 10.198495690144522",
> "group 3 mean , SE of Mean : 78.310441258245469",
> " 13.015876679555",
> "Mean of weight of group 4, SE of Mean : 76.967516495101669",
> " 12.1254882985", "")
>
> # Regular expression patterns are discussed all over the internet
> # in many places OTHER than R
> # You can start with ?regex, but there are many fine tutorials also
>
> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
> # For this task the regex has to match the whole "first line" of each set
> # ^ =match starting at the beginning of the string
> # .* =any character, zero or more times
> # "group " =match these characters
> # ( =first capture string starts here
> # \\d = any digit (first backslash for R, second backslash for regex)
> # + =one or more of the preceding (any digit)
> # ) =end of first capture string
> # [^:] =any non-colon character
> # * =zero or more of the preceding (non-colon character)
> # : =match a colon exactly
> # " *" =match zero or more spaces
> # ( =second capture string starts here
> # [ =start of a set of equally acceptable characters
> # -+ =either of these characters are acceptable
> # 0-9 =any digit would be acceptable
> # . =a period is acceptable (this is inside the [])
> # eE =in case you get exponential notation input
> # ] =end of the set of acceptable characters (number)
> # * =number of acceptable characters can be zero or more
> # ) =second capture string stops here
> # .* =zero or more of any character (just in case)
> # $ =at end of pattern, requires that the match reach the end
> # of the string
>
> # identify indexes of strings that match the pattern
> firstlines <- grep( pattern, indta )
> # Replace the matched portion (entire string) with the first capture #
> string
> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
> # Replace the matched portion (entire string) with the second capture #
> string
> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
> # Convert the lines just after the first lines to numeric
> v3 <- as.numeric( indta[ firstlines + 1 ] )
> # put it all into a data frame
> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>
> Figuring out how to deliver your result (output) is a separate question that
> depends where you want it to go.
>
>
> On Mon, 30 May 2016, Val wrote:
>
>> Hi all,
>>
>> I have a messy text file and from this text file I want extract some
>> information
>> here is the text file (out.txt). One record has tow lines. The mean comes
>> in the first line and the SE of the mean is on the second line. Here is
>> the
>> sample of the data.
>>
>> Mean of weight group 1, SE of mean : 72.289037489555276
>> 11.512956539215610
>> Average weight of group 2, SE of Mean : 83.940053900595013
>> 10.198495690144522
>> group 3 mean , SE of Mean : 78.310441258245469
>> 13.015876679555
>> Mean of weight of group 4, SE of Mean : 76.967516495101669
>> 12.1254882985
>>
>> I want produce the following table. How do i read it first and then
>> produce a
>>
>>
>> Gr1 72.289037489555276 11.512956539215610
>> Gr2 83.940053900595013 10.198495690144522
>> Gr3 78.310441258245469 13.015876679555
>> Gr4 76.967516495101669 12.1254882985
>>
>>
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list