[R] count number of stop words in R

Bert Gunter bgunter.4567 at gmail.com
Mon Jun 12 18:16:08 CEST 2017


I am unfamiliar with the tm package, but using basic regex tools, is
this what you want:

test <- "Mhm . Alright . There's um a young boy that's getting a
cookie jar . And it he's uh in bad shape because uh the thing is
falling over . And in the picture the mother is washing dishes and
doesn't see it . And so is the the water is overflowing in the sink .
And the dishes might get falled over if you don't fell fall over there
there if you don't get it . And it there it's a picture of a kitchen
window . And the curtains are very uh distinct . But the water is
still flowing ."

out <- strsplit(test, " ") ## creates a list whose only component is a
vector of the words

stopw <- c("a","the") ## or whatever they are

sum(grepl(paste(stopw,collapse="|"), out[[1]]))

## If you want to include ".", a regex special character, add:
sum(grepl(".",out[[1]],fixed=TRUE))


If this is all nonsense, just ignore -- and sorry I couldn't help.

-- Bert




Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 12, 2017 at 8:23 AM, Elahe chalabi <chalabi.elahe at yahoo.de> wrote:
> Thanks for your reply. I know the command
> data <- tm_map(data, removeWords, stopwords("english"))
> removes English stop words, I don't know how should I count stop words of my string:
>
>
> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>
>
>
>
>
> On Monday, June 12, 2017 7:24 AM, Patrick Casimir <patrcasi at nova.edu> wrote:
>
>
>
> You can define stop words as below.
> data <- tm_map(data, removeWords, stopwords("english"))
>
>
> Patrick Casimir, PhD
> Health Analytics, Data Science, Big Data Expert & Independent Consultant
> C: 954.614.1178
>
> ________________________________
>
> From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com>
> Sent: Monday, June 12, 2017 10:12:33 AM
> To: Elahe chalabi
> Cc: R-help Mailing List
> Subject: Re: [R] count number of stop words in R
>
> You can use regular expressions.
>
> ?regex and/or the stringr package are good places to start.  Of
> course, you have to define "stop words."
>
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help
> <r-help at r-project.org> wrote:
>> Hi all,
>>
>> Is there a way in R to count the number of stop words (English) of a string using tm package?
>>
>> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the
> sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .
>>
>> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter
> . She's outside ."
>>
>> Thanks for any help!
>> Elahe
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list