[R] R help

Jim Lemon drjimlemon at gmail.com
Mon Aug 8 01:22:21 CEST 2016


Hi Vladimir,
This may fix the NA problem:

vdat<-read.table(text="numberoftweet,tweet,locations,badwords
1,My cat is asleep,London,glum
2,My cat is flying,Paris,dashed
3,My cat is dancing,Berlin,mopey
4,My cat is singing,Rome,ill
5,My cat is reading,Budapest,sad
6,My cat is eating,Amsterdam,annoyed
7,My cat is hiding,Copenhagen,crazy
8,My cat is fluffy,Vilnius,terrified
9,My cat is annoyed,Athens,sick
10,My cat is exercising,Ankara,mortified
11,My cat is dreaming,Kracow,irked
12,My cat is mopey,Vienna,uneasy
13,My cat is glum,Brussels,upset
14,My cat is swinging,Madrid,
15,My cat is crazy,Ljubljana,",
sep=",",header=TRUE,stringsAsFactors=FALSE)

vdat$badwords[!nchar(vdat$badwords)]<-NA

badwords<-paste(vdat$badwords[!is.na(vdat$badwords)],collapse="|")

names(unlist(sapply(vdat$tweet,grep,pattern=badwords)))

Jim


On Sun, Aug 7, 2016 at 6:43 PM, Вова Грабарник <v.grabarnik at gmail.com> wrote:
> Hi Jim!
>
> That is exactly what I mean. Your example does the job I was looking for.
> If I refer to your example, my badwords column is not completed for all
> rows, like yours. For example it has only 10 values, but there are much more
> rows. When I try to introduce NA for blanks and write
> badwords<-paste(vdat$badwords,collapse="|")
> it collapses all values and writes smth like: word|word|NA|NA
> and if I dont introduce NAs when reading data, the outcome is still like:
> word|word|word|word||||||||||||||||
> and when I try to
> names(unlist(sapply(vdat$tweet,grep,pattern=badwords))) there is a mistake.
> I had this question before but do you know by any chance how to separate
> just those words in a column badwords and not include NA's or blanks.
>
> Thank you,
> Vladimir
>
> 2016-08-07 0:19 GMT+01:00 Jim Lemon <drjimlemon at gmail.com>:
>>
>> Hi Vladimir,
>> Do you want something like this?
>>
>> vdat<-read.table(text="numberoftweet,tweet,locations,badwords
>> 1,My cat is asleep,London,glum
>> 2,My cat is flying,Paris,dashed
>> 3,My cat is dancing,Berlin,mopey
>> 4,My cat is singing,Rome,ill
>> 5,My cat is reading,Budapest,sad
>> 6,My cat is eating,Amsterdam,annoyed
>> 7,My cat is hiding,Copenhagen,crazy
>> 8,My cat is fluffy,Vilnius,terrified
>> 9,My cat is annoyed,Athens,sick
>> 10,My cat is exercising,Ankara,mortified
>> 11,My cat is dreaming,Kracow,irked
>> 12,My cat is mopey,Vienna,uneasy
>> 13,My cat is glum,Brussels,upset",
>> sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>> badwords<-paste(vdat$badwords,collapse="|")
>>
>> names(unlist(sapply(vdat$tweet,grep,pattern=badwords)))
>>
>> Jim
>>
>>
>> On Sat, Aug 6, 2016 at 12:07 AM, Вова Грабарник <v.grabarnik at gmail.com>
>> wrote:
>> > Dear R command,
>> >
>> > I was wondering if I could ask you recommendations on my problem if that
>> > is
>> > fine with you.
>> > Basically, I have a data frame with 5 columns and 10 000 tweets
>> > recorded(rows). Those columns are: numberofatweet(number), tweet (actual
>> > textual tweet), locations(from where tweet sent), badwords(words that
>> > should not be used on twitter, that is just a column irrespective the
>> > number of a tweet and it contains only 80 rows with one word recorded in
>> > one cell.
>> > My question is whether it is possible to select only the rows which
>> > would
>> > contain such tweets, where in column "tweet"(actual text) there was one
>> > of
>> > those words from badwords column present. I tried to use grep and grepl,
>> > but nothing seems to be working.
>> >
>> > Thank you in advance,
>> > Vladimir
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> С уважением,
> Володя Грабарник



More information about the R-help mailing list