The googlenlp package provides an R interface to Google’s Cloud Natural Language API.
“Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.” [source]
There are four main features of the API, all of which are available through this R package [source]:
You can install the development version from GitHub:
::install_github("BrianWeinstein/googlenlp") devtools
To use the API, you’ll first need to create a Google Cloud project and enable billing, and get an API key.
Load the package and set your API key. There are two ways to do this.
Method A (preferred method) adds your API key as a variable to your
.Renviron
file. Under this method, you only need to do this
setup process one time.
library(googlenlp)
configure_googlenlp() # follow the instructions printed to the console
googlenlp setup instructions:
1. Your ~/.Renviron file will now open in a new window/tab.
*** If it doesn't open, run: file.edit("~/.Renviron") ***
2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
4. In your ~/.Renviron file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
5. Save your ~/.Renviron file.
6. *** Restart your R session for changes to take effect. ***
Method B defines your API key as a session-level variable. Under this method, you’ll need to set your API key at the beginning of each R session.
library(googlenlp)
set_api_key("MY_API_KEY") # replace this with your API key
Define the text you’d like to analyze.
<- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
text Sundar Pichai said in his keynote that users love their new Android phones."
The annotate_text
function analyzes the text’s syntax
(sentences and tokens), entities, sentiment, and language; and returns
the result as a five-element list.
<- annotate_text(text_body = text)
analyzed #> Warning: package 'bindrcpp' was built under R version 3.4.4
str(analyzed, max.level = 1)
#> List of 5
#> $ sentences :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 4 variables:
#> $ tokens :Classes 'tbl_df', 'tbl' and 'data.frame': 32 obs. of 17 variables:
#> $ entities :Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 8 variables:
#> $ documentSentiment:'data.frame': 1 obs. of 2 variables:
#> $ language : chr "en"
“Sentence extraction breaks up the stream of text into a series of sentences.” [API Documentation]
beginOffset
indicates the (zero-based) character index
of where the sentence begins (wtih UTF-8 encoding).magnitude
and score
fields quantify
each sentence’s sentiment — see the Document Sentiment section for more
details.$sentences analyzed
content | beginOffset | magnitude | score |
---|---|---|---|
Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show. | 0 | 0.0 | 0.0 |
Sundar Pichai said in his keynote that users love their new Android phones. | 113 | 0.6 | 0.6 |
“Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word. The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens.” [API Documentation]
lemma
indicates the token’s “root” word, and can be
useful in standardizing the word within the text.tag
indicates the token’s part of speech.$tokens analyzed
content | beginOffset | lemma | tag | aspect | case | form | gender | mood | number | person | proper | reciprocity | tense | voice | dependencyEdge_headTokenIndex | dependencyEdge_label |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 7 | NSUBJ | ||
, | 6 | , | PUNCT | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 0 | P |
headquartered | 8 | headquarter | VERB | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | PAST | VOICE_UNKNOWN | 0 | VMOD |
in | 22 | in | ADP | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 2 | PREP |
Mountain | 25 | Mountain | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 5 | NN |
View | 34 | View | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 3 | POBJ |
, | 38 | , | PUNCT | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 0 | P |
unveiled | 40 | unveil | VERB | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | INDICATIVE | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | PAST | VOICE_UNKNOWN | 7 | ROOT |
the | 49 | the | DET | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 11 | DET |
new | 53 | new | ADJ | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 11 | AMOD |
Android | 57 | Android | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 11 | NN |
phone | 65 | phone | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 7 | DOBJ |
at | 71 | at | ADP | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 7 | PREP |
the | 74 | the | DET | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 16 | DET |
Consumer | 78 | Consumer | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 16 | NN |
Electronic | 87 | Electronic | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 16 | NN |
Show | 98 | Show | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 12 | POBJ |
. | 102 | . | PUNCT | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 7 | P |
Sundar | 113 | Sundar | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 19 | NN |
Pichai | 120 | Pichai | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 20 | NSUBJ |
said | 127 | say | VERB | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | INDICATIVE | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | PAST | VOICE_UNKNOWN | 20 | ROOT |
in | 132 | in | ADP | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 20 | PREP |
his | 135 | his | PRON | ASPECT_UNKNOWN | GENITIVE | FORM_UNKNOWN | MASCULINE | MOOD_UNKNOWN | SINGULAR | THIRD | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 23 | POSS |
keynote | 139 | keynote | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 21 | POBJ |
that | 147 | that | ADP | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 26 | MARK |
users | 152 | user | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | PLURAL | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 26 | NSUBJ |
love | 158 | love | VERB | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | INDICATIVE | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | PRESENT | VOICE_UNKNOWN | 20 | CCOMP |
their | 163 | their | PRON | ASPECT_UNKNOWN | GENITIVE | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | PLURAL | THIRD | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 30 | POSS |
new | 169 | new | ADJ | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 30 | AMOD |
Android | 173 | Android | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | SINGULAR | PERSON_UNKNOWN | PROPER | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 30 | NN |
phones | 181 | phone | NOUN | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | PLURAL | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 26 | DOBJ |
. | 187 | . | PUNCT | ASPECT_UNKNOWN | CASE_UNKNOWN | FORM_UNKNOWN | GENDER_UNKNOWN | MOOD_UNKNOWN | NUMBER_UNKNOWN | PERSON_UNKNOWN | PROPER_UNKNOWN | RECIPROCITY_UNKNOWN | TENSE_UNKNOWN | VOICE_UNKNOWN | 20 | P |
“Entity Analysis provides information about entities in the text, which generally refer to named ‘things’ such as famous individuals, landmarks, common objects, etc… A good general practice to follow is that if something is a noun, it qualifies as an ‘entity.’” [API Documentation]
entity_type
indicates the type of entity (i.e., it
classifies the entity as a person, location, consumer good, etc.).mid
provides a “machine-generated identifier”
correspoding to the entity’s Google
Knowledge Graph entry.wikipedia_url
provides the entity’s Wikipedia URL.salience
indicates the entity’s importance to the
entire text. Scores range from 0.0 (less important) to 1.0 (highly
important).$entities analyzed
name | entity_type | mid | wikipedia_url | salience | content | beginOffset | mentions_type |
---|---|---|---|---|---|---|---|
ORGANIZATION | /m/045c7b | https://en.wikipedia.org/wiki/Google | 0.2557206 | 0 | PROPER | ||
users | PERSON | NA | NA | 0.1527633 | users | 152 | COMMON |
phone | CONSUMER_GOOD | NA | NA | 0.1311989 | phone | 65 | COMMON |
Android | CONSUMER_GOOD | /m/02wxtgw | https://en.wikipedia.org/wiki/Android_(operating_system) | 0.1224526 | Android | 57 | PROPER |
Android | CONSUMER_GOOD | /m/02wxtgw | https://en.wikipedia.org/wiki/Android_(operating_system) | 0.1224526 | Android | 173 | PROPER |
Sundar Pichai | PERSON | /m/09gds74 | https://en.wikipedia.org/wiki/Sundar_Pichai | 0.1141411 | Sundar Pichai | 113 | PROPER |
Mountain View | LOCATION | /m/0r6c4 | https://en.wikipedia.org/wiki/Mountain_View,_California | 0.1019596 | Mountain View | 25 | PROPER |
Consumer Electronic Show | EVENT | /m/01p15w | https://en.wikipedia.org/wiki/Consumer_Electronics_Show | 0.0703438 | Consumer Electronic Show | 78 | PROPER |
phones | CONSUMER_GOOD | NA | NA | 0.0338317 | phones | 181 | COMMON |
keynote | OTHER | NA | NA | 0.0175884 | keynote | 139 | COMMON |
“Sentiment analysis attempts to determine the overall attitude
(positive or negative) expressed within the text. Sentiment is
represented by numerical score
and magnitude
values.” [API
Documentation]
score
ranges from -1.0 (negative) to 1.0 (positive),
and indicates to the “overall emotional leaning of the text”.magnitude
“indicates the overall strength of emotion
(both positive and negative) within the given text, between 0.0 and
+inf. Unlike score, magnitude is not normalized; each expression of
emotion within the text (both positive and negative) contributes to the
text’s magnitude (so longer text blocks may have greater
magnitudes).”A note on how to interpret these sentiment values is posted here.
$documentSentiment analyzed
magnitude | score |
---|---|
0.6 | 0.3 |
language
indicates the detected language of the
document. Only English (“en”), Spanish (“es”) and Japanese (“ja”) are
currently supported by the API.
$language
analyzed#> [1] "en"