[R] numbers as part of long character
Hua Li
hualihua at yahoo.com
Fri Jun 13 00:34:25 CEST 2008
Thanks, Marc and Haris!
I didn't know the values of the numbers beforehand, so the scan method won't work, but "[^+-\\d.]+" will do!
And Haris, I didn't intend to keep the information of which number is B, which is C etc when asking the question, as I had a tedious way to do it (use strspilt and unlist over and over again, after I get the number). But if you have a easier way to do it, I'd like to know!
Hua
--- On Thu, 6/12/08, Charilaos Skiadas <cskiadas at gmail.com> wrote:
> From: Charilaos Skiadas <cskiadas at gmail.com>
> Subject: Re: [R] numbers as part of long character
> To: marc_schwartz at comcast.net
> Cc: hualihua at yahoo.com, r-help at r-project.org
> Date: Thursday, June 12, 2008, 6:03 PM
> On Jun 12, 2008, at 5:06 PM, Marc Schwartz wrote:
>
> > on 06/12/2008 03:46 PM Hua Li wrote:
> >> Hi,
> >> I'm looking for some way to pick up the
> numbers which are
> >> contained and buried in a long character. For
> example,
> >>
> outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C:
>
> >> 17139.21);"
> >> num.char =
> unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist
> >> (strsplit(unlist(strsplit
> >>
> (outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",f
>
> >> ixed=TRUE)),";",fixed=TRUE))
> >>
> num.vec=as.numeric(num.char[1:(length(num.char)-1)])
> >> num.char
> >> # "B" "1204.25"
> "E" "1204.25"
> "7581.11"
> >> "F" "8785.36"
> "8353.85" "C"
> "17139.21" "" num.vec
> >> # NA 1204.25 NA 1204.25 7581.11 NA
> 8785.36
> >> 8353.85 NA 17139.21
> >> would help me get the numbers such as 1204.25,
> 7581.11, etc, but
> >> with a warning message which reads:
> >> "Warning message:
> >> NAs introduced by coercion "
> >> Is there a way to get around this? Thanks!
> >> Hua
> >
> > Your code above is overly and needlessly complicated,
> which makes
> > it difficult to debug.
> >
> > I would take an approach whereby you use gsub() to
> strip non-
> > numeric characters from the input character vector and
> then use scan
> > () to read the remaining numbers:
> >
> > > Vec <-
> scan(textConnection(gsub("[^0-9\\.]+",
> " ", outtree.new)))
> > Read 6 items
> >
> > > Vec
> > [1] 1204.25 1204.25 7581.11 8785.36 8353.85
> 17139.21
> >
> > > str(Vec)
> > num [1:6] 1204 1204 7581 8785 8354 ...
> >
> >
> > The result of using gsub() above is:
> >
> > > gsub("[^0-9\\.]+", "
> ", outtree.new)
> > [1] " 1204.25 1204.25 7581.11 8785.36 8353.85
> 17139.21 "
> >
> >
> > That gives you a character vector which can then be
> passed to scan
> > () as a textConnection().
>
> Another approach would be to split on sequences of
> non-integers:
>
> as.numeric( strsplit(outtree.new,
> "[^\\d.]+", perl=TRUE)[[1]] )
>
>
> Use "[^+-\\d.]+" if your numbers might be
> signed. This does assume
> that dots, +/- occur only as decimal points.
>
> Hua, did you want to keep the information of which number
> is B, which
> is C etc?
>
> > See ?gsub, ?regex, ?textConnection and ?scan for more
> information.
> >
> > HTH,
> >
> > Marc Schwartz
> >
>
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College
More information about the R-help
mailing list