[R] spliting first 10 words in a string
Phil Spector
spector at stat.berkeley.edu
Mon Nov 1 22:52:32 CET 2010
Matevž -
Does this example do what you want?
> mysentences = c('Here is a sentence that has a bunch of words in it','Here is another sentence that also has a bunch of words','I have yet another sentence and it also has a whole bunch of words')
> data.frame(mysentences,do.call(rbind,lapply(strsplit(mysentences,' +'),'[',1:10)))
mysentences X1 X2
1 Here is a sentence that has a bunch of words in it Here is
2 Here is another sentence that also has a bunch of words Here is
3 I have yet another sentence and it also has a whole bunch of words I have
X3 X4 X5 X6 X7 X8 X9 X10
1 a sentence that has a bunch of words
2 another sentence that also has a bunch of
3 yet another sentence and it also has a
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Mon, 1 Nov 2010, Matevž Pavlič wrote:
> ...I would like i.e. split this sentence from field Opis in data.frame :
>
> Opis : "I have a sentense with ten words", so that it would conver to something like this :
>
> Opis : "I have a sentense with then words"; Column1 : "I"; Column2 : "have"; Column3 : "a"; Column4 : "sentense"; Column5: "with"; Column6 :"ten";column7:"words"
>
> ....or in data.frame something like this (as I understand) :
>
> data.frame': xx obs. of 12 variables:
> $ Opis : factor :"I have a sentense with then words";
> $ Column1 : factor "I";
> $ Column2 : factor "have";
> $ Column3 : factor "a";
> $ Column4 : factor "sentense";
> $ Column5: factor "with";
> $ Column6 : factor "ten";
> $ Column7: factor"words"
>
> Hope that explains it better, I am still having some troubles understanding R and all..
> m
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matevž Pavlič
> Sent: Monday, November 01, 2010 10:34 PM
> To: David Winsemius
> Cc: r-help at r-project.org
> Subject: Re: [R] spliting first 10 words in a string
>
> Hi,
>
> I am sorry, will try to be more exact from now on...
>
> I have a data.frame with a field called Opis. IT contains sentenses that I would like to split in words or fields in data.frame...when I say columns I mean as in Excel table. I would like to split "Opis" into ten fields from the first ten words in Opis field.
> Here is an example of my data.frame.
>
> 'data.frame': 22928 obs. of 12 variables:
> $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ...
> $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ...
> $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 9123 2500 4756 ...
> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: 154 125 101 101 NA 106 125 80 106 101 ...
> $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 53 56 53 53 53 53 53 ...
> $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ...
> $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 1 1 26 1 1 1 1 1 ...
>
> Hope that explains better...
> Thank you, m
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Monday, November 01, 2010 10:13 PM
> To: Matevž Pavlič
> Cc: r-help at r-project.org
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 1, 2010, at 4:39 PM, Matevž Pavlič wrote:
>
>> Hi all,
>>
>>
>>
>> I have a columnn with text that has quite a few words in it. I would
>> like to split these words in separate columns, but just first ten
>> words in the string. Is that possible in R?
>>
>>
>
> Not sure what a column means to you. It's not a precisely defined R
> type or class. (And you are requested to offered a concrete example
> rather than making us guess.)
>
> >words <-"I have a columnn with text that has quite a few words in
> it. I would like to split these words in separate columns, but just
> first ten words in the string. Is that possible in R?"
>
> > strsplit(words, " ")[[1]][1:10]
> [1] "I" "have" "a" "columnn" "with" "text"
> "that" "has" "quite" "a"
>
>
> Or if in a dataframe:
>
> > words <-c("I have a columnn with text that has quite a few words in
> it.", "I would like to split these words in separate columns", "but
> just first ten words in the string. Is that possible in R?")
> > worddf <- data.frame(words=words)
>
> > t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,
> 8] [,9] [,10]
> [1,] "I" "have" "a" "columnn" "with" "text" "that" "has"
> "quite" "a"
> [2,] "I" "would" "like" "to" "split" "these" "words" "in"
> "separate" "columns"
> [3,] "but" "just" "first" "ten" "words" "in" "the" "string."
> "Is" "that"
>
>
> --
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list