[R] manipulating strings

Marc Schwartz MSchwartz at MedAnalytics.com
Sun Aug 8 21:19:40 CEST 2004


On Sun, 2004-08-08 at 13:58, Stephen Nyangoma wrote:
> Hi
> I have a called fil consisting of the following strings.
> 
> 
> > fil
>   [1] " 102.2 639"   " 104.2 224"   " 105.1 1159"  " 107.1 1148"  
>       " 108.1 1376"
>   [6] " 109.2 1092"  " 111.2 1238"  " 112.2 349"   " 113.1 1204"  
>       " 114.1 537"
>  [11] " 115.0 303"   " 116.1 490"   " 117.2 202"   " 118.1 1864"  
>       " 119.0 357"
> 
> 
> I want to get a data frame like
> 
> Time    Obs
> 102.2   639
> 104.2   224
> 105.1  1159
> 107.1  1148
> 108.1  1376
> 109.2  1092
> 111.2  1238
> 112.2   349
> 113.1  1204  
> 114.1   537
> etc
> 
> Can anyone see an efficient way of doing this?
> 
> Thanks. Stephen

Try this:

# Create strings
MyStrings <- c(" 102.2 639",  " 104.2 224", " 105.1 1159",
               " 107.1 1148", " 108.1 1376", " 109.2 1092",
               " 111.2 1238", " 112.2 349",  " 113.1 1204",
               " 114.1 537",  " 115.0 303",  " 116.1 490",
               " 117.2 202",  " 118.1 1864", " 119.0 357")

> MyStrings
 [1] " 102.2 639"  " 104.2 224"  " 105.1 1159" " 107.1 1148"
 [5] " 108.1 1376" " 109.2 1092" " 111.2 1238" " 112.2 349" 
 [9] " 113.1 1204" " 114.1 537"  " 115.0 303"  " 116.1 490" 
[13] " 117.2 202"  " 118.1 1864" " 119.0 357" 


# Now convert to a data frame, by first using strsplit(), to break up
# each of the vector elements into three components, using " " as a
# split character. This returns a list, which we then convert to vector,
# using unlist(). Then use matrix() to convert the vector into a two
# dimensional object with 3 cols. Use 'byrow = TRUE' so that we fill
# the matrix row by row. Then take only the second and third columns 
# from the matrix and convert them into a data frame.
df <- as.data.frame(matrix(unlist(strsplit(MyStrings, split = " ")),
                    ncol = 3, byrow = TRUE)[, 2:3])

# Finally, set the colnames
colnames(df) <- c("Time", "Obs")

> df
    Time  Obs
1  102.2  639
2  104.2  224
3  105.1 1159
4  107.1 1148
5  108.1 1376
6  109.2 1092
7  111.2 1238
8  112.2  349
9  113.1 1204
10 114.1  537
11 115.0  303
12 116.1  490
13 117.2  202
14 118.1 1864
15 119.0  357


Note that the above presumes that your strings (character vectors) have
a leading " " in them and the Time and Obs elements are also separated
by a " " in each.

See ?strsplit for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list