[R] spliting first 10 words in a string
Matevž Pavlič
matevz.pavlic at gi-zrmk.si
Mon Nov 1 22:43:57 CET 2010
...I would like i.e. split this sentence from field Opis in data.frame :
Opis : "I have a sentense with ten words", so that it would conver to something like this :
Opis : "I have a sentense with then words"; Column1 : "I"; Column2 : "have"; Column3 : "a"; Column4 : "sentense"; Column5: "with"; Column6 :"ten";column7:"words"
....or in data.frame something like this (as I understand) :
data.frame': xx obs. of 12 variables:
$ Opis : factor :"I have a sentense with then words";
$ Column1 : factor "I";
$ Column2 : factor "have";
$ Column3 : factor "a";
$ Column4 : factor "sentense";
$ Column5: factor "with";
$ Column6 : factor "ten";
$ Column7: factor"words"
Hope that explains it better, I am still having some troubles understanding R and all..
m
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Matevž Pavlič
Sent: Monday, November 01, 2010 10:34 PM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] spliting first 10 words in a string
Hi,
I am sorry, will try to be more exact from now on...
I have a data.frame with a field called Opis. IT contains sentenses that I would like to split in words or fields in data.frame...when I say columns I mean as in Excel table. I would like to split "Opis" into ten fields from the first ten words in Opis field.
Here is an example of my data.frame.
'data.frame': 22928 obs. of 12 variables:
$ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ...
$ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ...
$ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
$ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
$ Opis : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 9123 2500 4756 ...
$ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: 154 125 101 101 NA 106 125 80 106 101 ...
$ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
$ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
$ GeolNastOpis : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 53 56 53 53 53 53 53 ...
$ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ...
$ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
$ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 1 1 26 1 1 1 1 1 ...
Hope that explains better...
Thank you, m
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Monday, November 01, 2010 10:13 PM
To: Matevž Pavlič
Cc: r-help at r-project.org
Subject: Re: [R] spliting first 10 words in a string
On Nov 1, 2010, at 4:39 PM, Matevž Pavlič wrote:
> Hi all,
>
>
>
> I have a columnn with text that has quite a few words in it. I would
> like to split these words in separate columns, but just first ten
> words in the string. Is that possible in R?
>
>
Not sure what a column means to you. It's not a precisely defined R
type or class. (And you are requested to offered a concrete example
rather than making us guess.)
>words <-"I have a columnn with text that has quite a few words in
it. I would like to split these words in separate columns, but just
first ten words in the string. Is that possible in R?"
> strsplit(words, " ")[[1]][1:10]
[1] "I" "have" "a" "columnn" "with" "text"
"that" "has" "quite" "a"
Or if in a dataframe:
> words <-c("I have a columnn with text that has quite a few words in
it.", "I would like to split these words in separate columns", "but
just first ten words in the string. Is that possible in R?")
> worddf <- data.frame(words=words)
> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,
8] [,9] [,10]
[1,] "I" "have" "a" "columnn" "with" "text" "that" "has"
"quite" "a"
[2,] "I" "would" "like" "to" "split" "these" "words" "in"
"separate" "columns"
[3,] "but" "just" "first" "ten" "words" "in" "the" "string."
"Is" "that"
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list