[R] Problem with Extracting Hash Tagged Words from Tweets

Sarah Goslee sarah.goslee at gmail.com
Tue May 22 17:02:40 CEST 2012


Hi,

On Tue, May 22, 2012 at 10:55 AM, Adedoyin-Olowe Mariam
<mariamolowe2008 at yahoo.com> wrote:
> Hi Sarah,
>
> Thanks for your help. I'm sorry my question is not clear enough.
> Maybe what I should ask for is how to remove the downloaded
> tweet numbers in
> x <- list
> (ie.[[1]], [1], [[2]], [2].....)
> before > sapply(x, str_extract_all, "#\\<.*?\\>").

Those aren't part of the tweets. Those are the numbers R uses when
displaying portions of a list.

> The presence of these numbers in square brackets is reporting error.

What error? You'll need to give us an actual reproducible example,
since what you are describing is unclear.

Although I suppose it's possible that you simply want:
> unlist(sapply(x, str_extract_all, "#\\<.*?\\>"))
[1] "#dayatthenews" "#pompeyhacks"  "#portsmouth"   "#southsea"
[5] "#Portsmouth"   "#portsmouth"

It's impossible for me to tell precisely what the problem is.

Sarah

>
> Thanks.
> Mariam
>
>
> ________________________________
> From: Sarah Goslee <sarah.goslee at gmail.com>
> To: Adedoyin-Olowe Mariam <mariamolowe2008 at yahoo.com>
> Cc: "r-help at r-project.org" <r-help at r-project.org>
> Sent: Tuesday, 22 May 2012, 13:53
> Subject: Re: [R] Problem with Extracting Hash Tagged Words from Tweets
>
> Hi,
>
> A small reproducible bit of your data would have been nice, and I have
> no idea what "manually remove all regular expressions" might mean, but
> take a look at this:
>
> x <- list("marymaryw: Get an insight into how journalists operate at
> The News by following #dayatthenews today #pompeyhacks #portsmouth
> #southsea", "VouchAR_Ports: £5 instead of £60 for 1 month of unlimited
> fitness classes at Outdoor Fitness Leeds - get bikini...
> http://t.co/BUrkjtCh #Portsmouth", "BillieRaePhoto: RT @vintagesecret:
> My dad has just sent me this picture. Looks like @GunwharfQuays is on
> fire?! #portsmouth http://t.co/HbAV7Hw0")
>
>> sapply(x, str_extract_all, "#\\<.*?\\>")
> [[1]]
> [1] "#dayatthenews" "#pompeyhacks"  "#portsmouth"  "#southsea"
>
> [[2]]
> [1] "#Portsmouth"
>
> [[3]]
> [1] "#portsmouth"
>
> Sarah
>
> On Tue, May 22, 2012 at 7:00 AM, Adedoyin-Olowe Mariam
> <mariamolowe2008 at yahoo.com> wrote:
>> Hello All,
>> Can anyone help me solve this problem.
>> Am trying to extract hash-tagged words from tweets downloaded from
>> twitteR.
>>
>> I can extract hash-tagged words from single tweet using
>> (stringr) str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
>> but cannot with more than one tweet at a time except I manually remove all
>> regular expressions and tweets numbers such as [[1]] and [1.]
>>
>> I want to automatically extract all #words in large number of tweets at a
>> go.
>> This is what I have done so far by removing all regular expressions
>> manually:
>>
>>> searchTwitter("#Portsmouth", n=20) [[1]]
>> [1] "marymaryw: Get an insight into how journalists operate at The News by
>> following #dayatthenews today #pompeyhacks #portsmouth #southsea"
>> [[2]]
>> [1] "VouchAR_Ports: £5 instead of £60 for 1 month of unlimited fitness
>> classes at Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh
>> #Portsmouth"
>> [[3]]
>> [1] "BillieRaePhoto: RT @vintagesecret: My dad has just sent me this
>> picture. Looks like @GunwharfQuays is on fire?! #portsmouth
>> http://t.co/HbAV7Hw0"
>> [[4]]
>> [1] "xangma: RT @vintagesecret: My dad has just sent me this picture.
>> Looks like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
>> [[5]]
>> [1] "vintagesecret: My dad has just sent me this picture. Looks like
>> @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
>> [[6]]
>> [1] "i_amnik: RT @BBCRadioSolent: Can you see the #GunwharfQuays fire?
>> Eye-witnesses please call - 0845 30 30 961. #Portsmouth."
>> [[7]]
>> [1] "vickiredmond: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf
>> on fire I have seen http://t.co/8LNAiqiD #portsmouth"
>> [[8]]
>> [1] "EmilieRosa: Highs of 25 degrees on the island this week!! Beach time
>> after exams I think! ;) #Portsmouth"
>> [[9]]
>> [1] "MrYiff: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on
>> fire I have seen http://t.co/8LNAiqiD #portsmouth"
>> [[10]]
>> [1] "otbsaad: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
>> at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
>> [[11]]
>> [1] "PN_Newsdesk: #Portsmouth: Ferryspeed looks to build on its past
>> successes http://t.co/CmDglDkg"
>> [[12]]
>> [1] "PN_Newsdesk: #Portsmouth: More room for stalls at top Southsea school
>> - A SOUTHSEA primary school still has room for people to se...
>> http://t.co/ucbYWjPR"
>> [[13]]
>> [1] "VouchAR_Ports: £14 instead of £30 for a pedicure with foiled transfer
>> at Forever Young, Stoke-on-Trent - get... http://t.co/P7gJBcl8 #Portsmouth"
>> [[14]]
>> [1] "TelArnott: Looking forward to #K1 today! #gym01 #portsmouth"
>> [[15]]
>> [1] "dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on fire I have
>> seen http://t.co/8LNAiqiD #portsmouth"
>> [[16]]
>> [1] "dan_germain: RT @portsmouthnews: News: Large fire at Gunwharf Quays -
>> http://t.co/s9RWpY0i #portsmouth #southsea"
>> [[17]]
>> [1] "i_amnik: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
>> at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
>> [[18]]
>> [1] "solentmotorcars: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
>> large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>> @BBCRadioSolent 96.1FM"
>> [[19]]
>> [1] "HantsChiefAlex: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
>> large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>> @BBCRadioSolent 96.1FM"
>> [[20]]
>> [1] "BBCRadioSolent: Can you see the #GunwharfQuays fire? Eye-witnesses
>> please call - 0845 30 30 961. #Portsmouth."
>>> tweets <-c("marymaryw: Get an insight into how journalists operate at The
>>> News by following #dayatthenews today #pompeyhacks #portsmouth #southsea
>>> VouchAR_Ports £5 instead of £60 for 1 month of unlimited fitness classes at
>>> Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh #Portsmouth
>>> BillieRaePhoto RT @vintagesecret My dad has just sent me this picture. Looks
>>> like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 xangma: RT
>>> @vintagesecret My dad has just sent me this picture. Looks like
>>> @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 vintagesecret
>>> My dad has just sent me this picture. Looks like @GunwharfQuays is on fire?!
>>> #portsmouth http://t.co/HbAV7Hw0iamnik: RT @BBCRadioSolent Can you see the
>>> #GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
>>> #Portsmouth. vickiredmond @MatMacAulay Best pic of#Gunwharf on fire I have
>>> seen http://t.co/8LNAiqiD #portsmouth EmilieRosa: Highs of 25 degrees on the
>>> island
>>  this week!! Beach time after exams I think!) #Portsmouth mYiff RT
>> @dan_germain: RT @MatMacAulay Best pic of #Gunwharf on fire I have seen
>> http://t.co/8LNAiqiD #portsmouth otbsaad RT @BBCRadioSolent: BREAKING NEWS -
>> Reports of a large fire at #GunwharfQuays in #Portsmouth. Latest updates on
>> @BBCRadioSolent 96.1FM PN_Newsdesk #Portsmouth: Ferryspeed looks to build on
>> its past successes http://t.co/CmDglDkg PN_Newsdesk #Portsmouth More room
>> for stalls at top Southsea school - A SOUTHSEA primary school still has room
>> for people to se... http://t.co/ucbYWjPR VouchAR_Ports £14 instead of £30
>> for a pedicure with foiled transfer at Forever Young, Stoke-on-Trent -
>> get... http://t.co/P7gJBcl8 #Portsmouth TelArnott Looking forward to #K1
>> today! #gym01 #portsmouth Best pic of #Gunwharf on fire I have seen
>> http://t.co/8LNAiqiD #portsmouth dangermain RT @portsmouthnews News Large
>> fire at Gunwharf Quays - http://t.co/s9RWpY0i #portsmouth #southsea iamnik
>> RT
>>  @BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays
>> in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM solentmotorcars RT
>> @BBCRadioSolent: BREAKING NEWS - Reports of a large fire at #GunwharfQuays
>> in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM HantsChiefAlex RT
>> @BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays in
>> #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM BBCRadioSolent Can you
>> see the #GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
>> #Portsmouth")
>>> str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
>> [[1]]
>>  [1] "#dayatthenews"  "#pompeyhacks"   "#portsmouth"    "#southsea"
>>  "#Portsmouth"    "#portsmouth"    "#portsmouth"
>>  [8] "#portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#Gunwharf"
>>  "#portsmouth"    "#Portsmouth"    "#Gunwharf"
>> [15] "#portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#Portsmouth"
>>  "#Portsmouth"    "#Portsmouth"    "#K1"
>> [22] "#gym01"         "#portsmouth"    "#Gunwharf"      "#portsmouth"
>>  "#portsmouth"    "#southsea"      "#GunwharfQuays"
>> [29] "#Portsmouth"    "#GunwharfQuays" "#Portsmouth"    "#GunwharfQuays"
>> "#Portsmouth"    "#GunwharfQuays" "#Portsmouth"
>>
>> Please I need help.
>>
>> Mariam
>



-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list