[R] spliting first 10 words in a string

Matevž Pavlič matevz.pavlic at gi-zrmk.si
Mon Nov 1 22:43:57 CET 2010


...I would like i.e. split this sentence from field Opis in data.frame :

Opis : "I have a sentense with ten words", so that it would conver to something like this :

Opis : "I have a sentense with then words"; Column1 : "I"; Column2 : "have"; Column3 : "a"; Column4 : "sentense"; Column5: "with"; Column6 :"ten";column7:"words" 

....or in data.frame something like this (as I understand) :

data.frame':   xx obs. of  12 variables:
$ Opis : factor :"I have a sentense with then words"; 
$ Column1 : factor  "I"; 
$ Column2 : factor "have"; 
$ Column3 : factor "a";
$ Column4 : factor "sentense"; 
$ Column5: factor "with";
$ Column6 : factor "ten";
$ Column7: factor"words" 

Hope that explains it better, I am still having some troubles understanding R and all..
m


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matevž Pavlič
Sent: Monday, November 01, 2010 10:34 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] spliting first 10 words in a string

Hi, 

I am sorry, will try to be more exact from now on...

I have a data.frame  with a field called Opis. IT contains sentenses that I would like to split in words or fields in data.frame...when I say columns I mean as in Excel table. I would like to split "Opis" into ten fields from the first ten words in Opis field.
Here is an example of my data.frame. 

'data.frame':   22928 obs. of  12 variables:
 $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
 $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
 $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
 $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
 $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 9123 2500 4756 ...
 $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: 154 125 101 101 NA 106 125 80 106 101 ...
 $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
 $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
 $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 53 56 53 53 53 53 53 ...
 $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
 $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
 $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 1 1 26 1 1 1 1 1 ...

Hope that explains better...
Thank you, m

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Monday, November 01, 2010 10:13 PM
To: Matevž Pavlič
Cc: r-help at r-project.org
Subject: Re: [R] spliting first 10 words in a string


On Nov 1, 2010, at 4:39 PM, Matevž Pavlič wrote:

> Hi all,
>
>
>
> I have a columnn with text that has quite a few words in it. I would 
> like to split these words in separate columns, but just first ten 
> words in the string. Is that possible in R?
>
>

Not sure what a column means to you. It's not a precisely defined R  
type or class. (And you are requested to offered a concrete example  
rather than making us guess.)

 >words <-"I have a columnn with text that has quite a few words in  
it. I would like to split these words in separate columns, but just  
first ten words in the string. Is that possible in R?"

 > strsplit(words, " ")[[1]][1:10]
  [1] "I"       "have"    "a"       "columnn" "with"    "text"     
"that"    "has"     "quite"   "a"


Or if in a dataframe:

 > words <-c("I have a columnn with text that has quite a few words in  
it.",   "I would like to split these words in separate columns", "but  
just first ten words in the string. Is that possible in R?")
 > worddf <- data.frame(words=words)

 > t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
      [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [, 
8]      [,9]       [,10]
[1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"      
"quite"    "a"
[2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"       
"separate" "columns"
[3,] "but" "just"  "first" "ten"     "words" "in"    "the"   "string."  
"Is"       "that"


-- 
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list