[R] strings

Roger Bivand rsb at reclus.nhh.no
Wed Oct 4 17:48:19 CEST 2000


On Wed, 4 Oct 2000, Martin Maechler wrote:

> >>>>> "KH" == Kurt Hornik <Kurt.Hornik at ci.tuwien.ac.at> writes:
> 
> >>>>> Richard Rowe writes:
>     >> I am attempting to analyse some behaviour sequence data.  The input is
>     >> an alphabetic string "ASDFGH ... ".  I wish to start at one end of the
>     >> string, peel off each character, and convert to an integer to develop
>     >> transition matrices etc. My blundering through the ref manual hasn't
>     >> produced any light.
>     KH> Not sure what precisely you need ... if it is about converting a string
>     KH> to a vector of characters, you could use
> 
>     R> x <- "ASDFGH"
>     R> x
>     KH> [1] "ASDFGH"
>     R> unlist(strsplit(x, NULL))
>     KH> [1] "A" "S" "D" "F" "G" "H"
> 
>     KH> and proceed from there.
> 

Maybe:

> x <- "ASDFGH"
> cx <- unlist(strsplit(x, NULL))
> match(cx, LETTERS)
[1]  1 19  4  6  7  8

or stepping along cx with which(LETTERS == cx[i]) to get them one by one?
However, the real question seemed to me to be how to generate the
transition matrices, and may mean something more like creating a data
frame with a source and one or more destinations:

> cx1 <- substring(x, 1:nchar(x), 2:nchar(x))
> cx1
[1] "AS" "SD" "DF" "FG" "GH" ""  
> tdf <- matrix(NA, nrow=length(cx1)-1, ncol=2)
> for (i in 1:length(cx1)-1) tdf[i,] <- match(unlist(strsplit(cx1[i],
+ NULL)), LETTERS)
> tdf
     [,1] [,2]
[1,]    1   19
[2,]   19    4
[3,]    4    6
[4,]    6    7
[5,]    7    8
> table(tdf[,1], tdf[,2])
    
     4 6 7 8 19
  1  0 0 0 0  1
  4  0 1 0 0  0
  6  0 0 1 0  0
  7  0 0 0 1  0
  19 1 0 0 0  0

which with a bit more work might even be able to avoid going to integers,
using factors for source and destination.

> tdf <- matrix(unlist(strsplit(cx1, NULL)), nrow=length(cx1)-1, ncol=2,
+ byrow=TRUE)
> tdf
     [,1] [,2]
[1,] "A"  "S" 
[2,] "S"  "D" 
[3,] "D"  "F" 
[4,] "F"  "G" 
[5,] "G"  "H" 
> table(tdf[,1], tdf[,2])
   
    D F G H S
  A 0 0 0 0 1
  D 0 1 0 0 0
  F 0 0 1 0 0
  G 0 0 0 1 0
  S 1 0 0 0 0

There must be a way of using rbind() to add to the transaction record too,
removing the need to know the length of the data vector.

Best wishes,

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list