Introducing the ‘polmineR’-package

Andreas Blaette (andreas.blaette@uni-due.de)

2023-10-29

Purpose

The purpose of the package polmineR is to offer a toolset for the interactive analysis of corpora using R. Apart from performance and usability, key considerations for developing the package are:

The polmineR package supplements R packages that are already widely used for text mining. The CRAN NLP task view is a good place to learn about relevant packages. The polmineR package is intended to be an interface between the Corpus Workbench (CWB), an efficient system for storing and querying large corpora, and existing packages for text mining and text statistics.

Apart from the speed of text processing, the Corpus Query Processor (CQP) and the CQP syntax provide a powerful and widely used syntax to query corpora. This is not an unique idea. Using a combination of R and the CWB implies a software architecture you will also find in the TXM project, or with CQPweb. The polmineR package offers a library with the grammer of corpus analysis below a graphical user interface (GUI). It is a toolset to perform simple tasts efficiently as well as to implement complex workflows.

Advanced users will benefit from acquiring a good understanding of the Corpus Workbench. The Corpus Encoding Tutorial is an authoritative text for that. The vignette of the rcqp package, albeit archived, includes an excellent explanation of the CWB data-model. The inferface used now to use CWB/CQP functionality is the RcppCWB package.

A basic issue to understand is the difference between s- and p-attributes. The CWB distinguishes structural attributes (s-attributes) that will contain the metainformation that can be used to generate subcorpora/partitions, and positional attributes (p-attributes). Typically, the p-attributes will be ‘word’, ‘pos’ (for part-of-speech) and ‘lemma’ (for the lemmatized word form).

Getting started

Loading polmineR

The polmineR package is loaded just like any other package.

library(polmineR)

Upon loading the package, the package version is reported. As polmineR is under active development, please check whether a more recent version is available at CRAN. Development versions are available at GitHub.

In addition, you will see an information on the session registry, which needs some further explanation.

The session registry directory

Indexed corpus data may be stored at different locations on your machine. CWB users will usually have a data directory with subdirectories for every single corpus. But corpus data may also reside within R packages, or anywhere else.

It is not necessary to move indexed corpora to one single location. The only recommendation is to have them on a device that can be accessed sufficiently fast. Corpora are not fully loaded into memory, but information is retrieved from disk on a ‘on demand’-basis. Thus, storing corpus data on a SSD may be faster than a hard drive.

The CWB will look up information on the corpora in a directory called registry that is defined by the environment variable CORPUS_REGISTRY. Starting with version v0.7.9, the polmineR package creates a temporary registry directory in the temporary session directory. To get the path of the session registry directory, call registry(). The output is the session registry you have seen when loading polmineR.

registry()
## /var/folders/fw/qwt11pjx1qs83dl2jwltcvmr0000gn/T/RtmpXUkqK1/polmineR_registry

The session registry directory combines the registry files describing the corpora polmineR knows about. Upon loading polmineR, the files in the registry directory defined by the environment variable CORPUS_REGISTRY are copied to the session registry directory. To see whether the environment variable CORPUS_REGISTRY is set, use the Sys.getenv()-function.

Sys.getenv("CORPUS_REGISTRY")

See the annex for an explanation how to set the CORPUS_REGISTRY environment variable for the current R session, or permanently.

Using and installing packaged corpora

If you want to use a corpus wrapped into a R data package, call use() with the name of the R package. The function will add the registry files describing the corpora in the package to the session registry directory introduced before.

In the followings examples, the REUTERS corpus included in the polmineR package will be used for demonstration purposes. It is a sample of Reuters articles that is included in the tm package (cp. http://www.daviddlewis.com/resources/testcollections/reuters21578/), and may already be known to some R users.

use("polmineR")
use("RcppCWB", corpus = "REUTERS")

Checking corpora are available

The corpus()-method can be used to check which corpora are described in the registry and accessible. The REUTERS corpus in our case (note that the names of CWB corpora are always written upper case). In addition to the English REUTERS corpus, a small subset of the GermaParl corpus (“GERMAPARLMINI”) is included in the polmineR package.

corpus()
##           corpus
## 1            AAZ
## 2           ALBB
## 3           ALLZ
## 4           BADZ
## 5           BEZE
## 6           BEZG
## 7            BGM
## 8            BKR
## 9            BKU
## 10           BMP
## 11           BNA
## 12            BR
## 13          BRZE
## 14          BUER
## 15           BVH
## 16            BZ
## 17          COBU
## 18           DAW
## 19           DAZ
## 20          DECH
## 21          DIKI
## 22           DNN
## 23          EXPR
## 24           EZE
## 25          FEPR
## 26           FNP
## 27           FRT
## 28           GAZ
## 29    GERMAPARL2
## 30 GERMAPARLMINI
## 31          GEZE
## 32          GIAN
## 33           GTB
## 34            HA
## 35          HAKU
## 36          HATA
## 37           HAZ
## 38          HIAZ
## 39            HK
## 40           HNA
## 41          HOFZ
## 42          HOZG
## 43           HST
## 44          IDZL
## 45          KIRZ
## 46            KN
## 47            KR
## 48          KRAN
## 49          KSTA
## 50          LAAN
## 51           LAZ
## 52          LAZE
## 53           LBN
## 54           LIN
## 55            LR
## 56           LVZ
## 57          MAER
## 58          MASP
## 59          MATK
## 60           MAZ
## 61          MBGA
## 62          MBVS
## 63           MIB
## 64          MOPO
## 65           MPW
## 66          MUAZ
## 67          MUME
## 68          MUVB
## 69          MUZE
## 70            MZ
## 71     NADIRAFAZ
## 72      NADIRASZ
## 73     NADIRATAZ
## 74          NAZT
## 75          NPHA
## 76    PARTYPAGES
## 77   PARTYPAGES2
## 78          PRIG
## 79       REUTERS
##                                                                         registry
## 1                                         /Users/andreasblatte/Data/cwb/registry
## 2                                         /Users/andreasblatte/Data/cwb/registry
## 3                                         /Users/andreasblatte/Data/cwb/registry
## 4                                         /Users/andreasblatte/Data/cwb/registry
## 5                                         /Users/andreasblatte/Data/cwb/registry
## 6                                         /Users/andreasblatte/Data/cwb/registry
## 7                                         /Users/andreasblatte/Data/cwb/registry
## 8                                         /Users/andreasblatte/Data/cwb/registry
## 9                                         /Users/andreasblatte/Data/cwb/registry
## 10                                        /Users/andreasblatte/Data/cwb/registry
## 11                                        /Users/andreasblatte/Data/cwb/registry
## 12                                        /Users/andreasblatte/Data/cwb/registry
## 13                                        /Users/andreasblatte/Data/cwb/registry
## 14                                        /Users/andreasblatte/Data/cwb/registry
## 15                                        /Users/andreasblatte/Data/cwb/registry
## 16                                        /Users/andreasblatte/Data/cwb/registry
## 17                                        /Users/andreasblatte/Data/cwb/registry
## 18                                        /Users/andreasblatte/Data/cwb/registry
## 19                                        /Users/andreasblatte/Data/cwb/registry
## 20                                        /Users/andreasblatte/Data/cwb/registry
## 21                                        /Users/andreasblatte/Data/cwb/registry
## 22                                        /Users/andreasblatte/Data/cwb/registry
## 23                                        /Users/andreasblatte/Data/cwb/registry
## 24                                        /Users/andreasblatte/Data/cwb/registry
## 25                                        /Users/andreasblatte/Data/cwb/registry
## 26                                        /Users/andreasblatte/Data/cwb/registry
## 27                                        /Users/andreasblatte/Data/cwb/registry
## 28                                        /Users/andreasblatte/Data/cwb/registry
## 29                                        /Users/andreasblatte/Data/cwb/registry
## 30 /var/folders/fw/qwt11pjx1qs83dl2jwltcvmr0000gn/T/RtmpXUkqK1/polmineR_registry
## 31                                        /Users/andreasblatte/Data/cwb/registry
## 32                                        /Users/andreasblatte/Data/cwb/registry
## 33                                        /Users/andreasblatte/Data/cwb/registry
## 34                                        /Users/andreasblatte/Data/cwb/registry
## 35                                        /Users/andreasblatte/Data/cwb/registry
## 36                                        /Users/andreasblatte/Data/cwb/registry
## 37                                        /Users/andreasblatte/Data/cwb/registry
## 38                                        /Users/andreasblatte/Data/cwb/registry
## 39                                        /Users/andreasblatte/Data/cwb/registry
## 40                                        /Users/andreasblatte/Data/cwb/registry
## 41                                        /Users/andreasblatte/Data/cwb/registry
## 42                                        /Users/andreasblatte/Data/cwb/registry
## 43                                        /Users/andreasblatte/Data/cwb/registry
## 44                                        /Users/andreasblatte/Data/cwb/registry
## 45                                        /Users/andreasblatte/Data/cwb/registry
## 46                                        /Users/andreasblatte/Data/cwb/registry
## 47                                        /Users/andreasblatte/Data/cwb/registry
## 48                                        /Users/andreasblatte/Data/cwb/registry
## 49                                        /Users/andreasblatte/Data/cwb/registry
## 50                                        /Users/andreasblatte/Data/cwb/registry
## 51                                        /Users/andreasblatte/Data/cwb/registry
## 52                                        /Users/andreasblatte/Data/cwb/registry
## 53                                        /Users/andreasblatte/Data/cwb/registry
## 54                                        /Users/andreasblatte/Data/cwb/registry
## 55                                        /Users/andreasblatte/Data/cwb/registry
## 56                                        /Users/andreasblatte/Data/cwb/registry
## 57                                        /Users/andreasblatte/Data/cwb/registry
## 58                                        /Users/andreasblatte/Data/cwb/registry
## 59                                        /Users/andreasblatte/Data/cwb/registry
## 60                                        /Users/andreasblatte/Data/cwb/registry
## 61                                        /Users/andreasblatte/Data/cwb/registry
## 62                                        /Users/andreasblatte/Data/cwb/registry
## 63                                        /Users/andreasblatte/Data/cwb/registry
## 64                                        /Users/andreasblatte/Data/cwb/registry
## 65                                        /Users/andreasblatte/Data/cwb/registry
## 66                                        /Users/andreasblatte/Data/cwb/registry
## 67                                        /Users/andreasblatte/Data/cwb/registry
## 68                                        /Users/andreasblatte/Data/cwb/registry
## 69                                        /Users/andreasblatte/Data/cwb/registry
## 70                                        /Users/andreasblatte/Data/cwb/registry
## 71                                        /Users/andreasblatte/Data/cwb/registry
## 72                                        /Users/andreasblatte/Data/cwb/registry
## 73                                        /Users/andreasblatte/Data/cwb/registry
## 74                                        /Users/andreasblatte/Data/cwb/registry
## 75                                        /Users/andreasblatte/Data/cwb/registry
## 76                                        /Users/andreasblatte/Data/cwb/registry
## 77                                        /Users/andreasblatte/Data/cwb/registry
## 78                                        /Users/andreasblatte/Data/cwb/registry
## 79 /var/folders/fw/qwt11pjx1qs83dl2jwltcvmr0000gn/T/RtmpXUkqK1/polmineR_registry
##    encoding  type template       size
## 1      utf8 press    FALSE  276354661
## 2      utf8 press    FALSE  104087703
## 3      utf8 press    FALSE   98060280
## 4      utf8 press    FALSE  124757918
## 5      utf8 press    FALSE  156141374
## 6      utf8 press    FALSE   77354392
## 7      utf8 press    FALSE  158021332
## 8      utf8 press    FALSE   71435971
## 9      utf8 press    FALSE   62953816
## 10     utf8 press    FALSE  158035211
## 11     utf8 press    FALSE   65460159
## 12     utf8 press    FALSE  106345949
## 13     utf8 press    FALSE   44281497
## 14     utf8 press    FALSE  122063568
## 15     utf8 press    FALSE   27523558
## 16     utf8 press    FALSE   69364571
## 17     utf8 press    FALSE  110671900
## 18     utf8 press    FALSE    2183303
## 19     utf8 press    FALSE  118165806
## 20     utf8 press    FALSE  271587363
## 21     utf8 press    FALSE  106118733
## 22     utf8 press    FALSE  134217434
## 23     utf8 press    FALSE   99514625
## 24     utf8 press    FALSE   76454964
## 25     utf8 press    FALSE  445999427
## 26     utf8 press    FALSE  271515022
## 27     utf8 press    FALSE  324831526
## 28     utf8 press    FALSE  279723308
## 29     utf8  plpr     TRUE  280994755
## 30   latin1  plpr     TRUE     222201
## 31     utf8 press    FALSE   24878428
## 32     utf8 press    FALSE  128532860
## 33     utf8 press    FALSE   93611740
## 34     utf8 press    FALSE  309316506
## 35     utf8 press    FALSE   22098597
## 36     utf8 press    FALSE  112650129
## 37     utf8 press    FALSE  115501402
## 38     utf8 press    FALSE   63733820
## 39     utf8 press    FALSE  217064551
## 40     utf8 press    FALSE   81742835
## 41     utf8 press    FALSE   13709041
## 42     utf8 press    FALSE  100494155
## 43     utf8 press    FALSE  237772204
## 44     utf8 press    FALSE  111726534
## 45     utf8 press    FALSE  146179422
## 46     utf8 press    FALSE   88096315
## 47     utf8 press    FALSE  354255652
## 48     utf8 press    FALSE   91371448
## 49     utf8 press    FALSE  353024966
## 50     utf8 press    FALSE   79286380
## 51     utf8 press    FALSE  281089133
## 52     utf8 press    FALSE  121771964
## 53     utf8 press    FALSE  203932334
## 54     utf8 press    FALSE   62789343
## 55     utf8 press    FALSE  244971515
## 56     utf8 press    FALSE  284856401
## 57     utf8 press    FALSE  469236411
## 58     utf8 press    FALSE  118983151
## 59     utf8 press    FALSE   63077563
## 60     utf8 press    FALSE  235257503
## 61     utf8 press    FALSE   35568855
## 62     utf8 press    FALSE  231721744
## 63     utf8 press    FALSE  518456726
## 64     utf8 press    FALSE   71294105
## 65     utf8 press    FALSE  389002134
## 66     utf8 press    FALSE  103931018
## 67     utf8 press    FALSE  598574538
## 68     utf8 press    FALSE  114471715
## 69     utf8 press    FALSE   78695355
## 70     utf8 press    FALSE  518942364
## 71     utf8 press    FALSE  174552590
## 72     utf8 press    FALSE 1195146149
## 73     utf8 press    FALSE  627227412
## 74     utf8 press    FALSE  140884541
## 75     utf8 press    FALSE   91686603
## 76     utf8  <NA>    FALSE    2931870
## 77     utf8  <NA>    FALSE    2743636
## 78     utf8 press    FALSE   97812464
## 79   latin1  <NA>     TRUE       4050

Session settings

Many methods in the polmineR package use default settings that are set in the general options settings. Following a convention, settings relevant for the polmineR package simplystart with ‘polmineR.’ Inspect the settings as follows:

options()[grep("polmineR", names(options()))]

Several methods (such as kwic, or cooccurrences) will use these settings, if no explicit other value is provided. You can see this in the usage section of help pages (?kwic, for instance). To change settings, this is how.

options("polmineR.left" = 5)
options("polmineR.right" = 5)
options("polmineR.mc" = FALSE)

Working with corpora: Core methods

Core analytical tasks are implemented as methods (S4 class system), i.e. the bevaviour of the methods changes depending on the object that is supplied. Almost all methods can be applied to corpora (indicated by a length-one character vector) as well as partitions (subcorpora). As a quick start, methods applied to corpora are explained first.

Keyword-in-context (kwic)

The kwic method applied to the name of a corpus will return a KWIC object. Output will be shown in the viewer pane of RStudio. Technically, a htmlwidget is prepared which offers some convenient functionality.

k <- kwic("REUTERS", "oil")

You can include metadata from the corpus into the kwic display using the ‘s_attributes’ argument. Let us start with one s-attribute.

k <- kwic("REUTERS", "oil", s_attributes = "places")

But you can display any number of s-attributes.

k <- kwic("REUTERS", "oil", s_attributes = c("id", "places"))

You can also use the CQP query syntax for formulating queries. That way, you can find multi-word expressions, or match in a manner you may know from using regular expressions.

k <- kwic("REUTERS", '"oil" "price.*"')

Explaining the CQP syntax goes beyon this vignette. Consult the CQP tutorial to learn more about the CQP syntax.

Getting counts and frequencies

You can count one or several hits in a corpus.

cnt <- count("REUTERS", "Kuwait")
cnt <- count("REUTERS", c("Kuwait", "USA", "Bahrain"))
cnt <- count("REUTERS", c('"United" "States"', '"Saudi" "Arabia.*"'), cqp = TRUE)

Dispersions

Use the dispersion()-method to get dispersions of counts accross one (or two) dimensions.

oil <- dispersion("REUTERS", query = "oil", s_attribute = "id", progress = FALSE)
saudi_arabia <- dispersion(
  "REUTERS", query = '"Saudi" "Arabia.*"',
  s_attribute = "id", cqp = TRUE, progress = FALSE
  )

Note that it is a data.table that is returned. You can proceed to a visualisation easily.

barplot(height = saudi_arabia[["count"]], names.arg = saudi_arabia[["id"]], las = 2)

Cooccurrences

To analyse the neighborhood of a token, or the match for a CQP query, use cooccurrences().

oil <- cooccurrences("REUTERS", query = "oil")
sa <- cooccurrences("REUTERS", query = '"Saudi" "Arabia.*"', left = 10, right = 10)
top5 <- subset(oil, rank_ll <= 5)

In an interactive session, simply type top5 in the terminal, and the output will be shown in the data viewer. To inspect the output in the viewer pane, you can coerce the object to a htmlwidget. This is also a good way how to include the table in a Rmarkdown document.

top5

For further operations, get the the table with the statistical results by applying the as.data.frame()-method.

as.data.frame(top5)
##       word word_id count_partition count_coi count_ref  exp_coi   exp_ref
## 1   prices      12              47        27        20 9.229607 37.770393
## 2    crude      14              20        13         7 3.927492 16.072508
## 3 industry      78              10         8         2 1.963746  8.036254
## 4   recent     356               6         5         1 1.178248  4.821752
## 5       on     169              17         8         9 3.338369 13.661631
##          ll rank_ll
## 1 33.045786       1
## 2 19.616220       2
## 3 16.968527       3
## 4 11.331186       4
## 5  6.505618       5

Working with subcorpora - partitions

Working with partitions (i.e. subcorpora) based on s-attributes is an important feature of the ‘polmineR’ package. So if we want to work with the articles in the REUTERS corpus related to Kuaweit in 2006:

kuwait <- partition("REUTERS", places = "kuwait", regex = TRUE)

To get some basic information about the partition that has been set up, the ‘show’-method can be used. It is also called when you simply type the name of the partition object.

kuwait
## ** partition object **
## corpus:             REUTERS
## name:
## s-attributes:       places = kuwait
## cpos:               3 pairs of corpus positions
## size:               660 tokens
## count:              not available

To evaluate s-attributes, regular expressions can be used.

saudi_arabia <- partition("REUTERS", places = "saudi-arabia", regex = TRUE)
s_attributes(saudi_arabia, "id")
## [1] "242" "248" "273" "349" "352"

If you work with a flat XML structure, the order of the provided s-attributes may be relevant for speeding up the set up of the partition. For a nested XML, it is important that with the order, you move from ancestors to childs. For further information, see the documentation of the partition-function.

Cooccurrences

The cooccurrences-method can be applied to partition-objects.

saudi_arabia <- partition("REUTERS", places = "saudi-arabia", regex = TRUE)
oil <- cooccurrences(saudi_arabia, "oil", p_attribute = "word", left = 10, right = 10)

Note that is is possible to provide a query that uses the full CQP syntax. The statistical analysis of collocations to the query can be accessed as the slot “stat” of the context object. Alternatively, you can get the table with the statistics using as.data.frame.

df <- as.data.frame(oil)
df[1:5, c("word", "ll", "rank_ll")]
##     word       ll rank_ll
## 1 prices 5.383314       1
## 2     by 5.081723       2
## 3   Gulf 4.944806       3
## 4  crude 2.968707       4
## 5   last 2.756452       5

Distribution of queries

To understand the occurance of a phenomenon, the distribution of query results across one or two dimensions will often be interesing. This is done via the ‘distribution’ function. The query may use the CQP syntax.

q1 <- dispersion(saudi_arabia, query = 'oil', s_attribute = "id", progress = FALSE)
q2 <- dispersion(saudi_arabia, query = c("oil", "barrel"), s_attribute = "id", progress = FALSE)

Getting features

To identify the specific vocabulary of a corpus of interest, a statistical test based (chi square, or log likelihood) can be performed.

saudi_arabia <- partition("REUTERS", places = "saudi-arabia", regex = TRUE)
saudi_arabia <- enrich(saudi_arabia, p_attribute = "word")

saudi_arabia_features <- features(saudi_arabia, "REUTERS", included = TRUE)
saudi_arabia_features_min <- subset(saudi_arabia_features, rank_chisquare <= 10.83 & count_coi >= 5)
saudi_arabia_features_min

To extract the statistical information, you can also use the as.data.frame-method.

df <- as.data.frame(saudi_arabia_features_min)
df_min <- df[,c("word", "count_coi", "count_ref", "chisquare")]

Getting a tm TermDocumentMatrix

For many applications, term-document matrices are the point of departure. The tm class TermDocumentMatrix serves as an input to several R packages implementing advanced text mining techniques. Obtaining this input from a corpus imported to the CWB will usually involve setting up a partitionBundle and then applying a method to get the matrix.

articles <- corpus("REUTERS") %>% partition_bundle(s_attribute = "id", progress = FALSE)
## ℹ s-attribute "id" has values: yes
## ℹ get regions and get values
## ✔ get regions and get values [5ms]
## 
## ℹ instantiate objects (n = 20)
## ✔ instantiate objects (n = 20) [16ms]
## 
## ℹ assign names
## ✔ assign names [3ms]
## 
articles_count <- count(articles, p_attribute = "word")
tdm <- as.TermDocumentMatrix(articles_count, col = "count", verbose = FALSE)

class(tdm) # to see what it is
## [1] "TermDocumentMatrix"    "simple_triplet_matrix"
show(tdm)
## <<TermDocumentMatrix (terms: 1192, documents: 20)>>
## Non-/sparse entries: 2409/21431
## Sparsity           : 90%
## Maximal term length: 15
## Weighting          : term frequency (tf)
m <- as.matrix(tdm) # turn it into an ordinary matrix
m[c("oil", "barrel"),]
##         Docs
## Terms    127 144 191 194 211 236 237 242 246 248 273 349 352 353 368 489 502
##   oil      5  12   2   1   0   6   4   2   5   6   5   4   4   4   3   4   5
##   barrel   2   0   1   1   0   3   0   0   0   2   2   0   1   0   0   1   1
##         Docs
## Terms    543 704 708
##   oil      2   3   1
##   barrel   1   0   0

Reading

A key consideration of the polmineR package is to offer tools for combining quantitative and qualitative approaches to text analysis. Use the html()-method, or the read()-method to return to the full text. In this example, we define a maximum height for the output, which is useful when including full text output in a Rmarkdown document.

P <- partition("REUTERS", id = "248")
H <- html(P, height = "250px")
H
Corpus: REUTERS

Corpus: REUTERS

Saudi Arabian Oil Minister Hisham Nazer reiterated the kingdom’s commitment to last December’s OPEC accord to boost world oil prices and stabilise the market the official Saudi Press Agency SPA said Asked by the agency about the recent fall in free market oil prices Nazer said Saudi Arabia is fully adhering by the Accord and it will never sell its oil at prices below the pronounced prices under any circumstance Nazer quoted by SPA said recent pressure on free market prices may be because of the end of the northern hemisphere winter season and the glut in the market Saudi Arabia was a main architect of the December accord under which OPEC agreed to lower its total output ceiling by 7.25 pct to 15.8 mln barrels per day bpd and return to fixed prices of around 18 dlrs a barrel The agreement followed a year of turmoil on oil markets which saw prices slump briefly to under 10 dlrs a barrel in mid 1986 from about 30 dlrs in late 1985 Free market prices are currently just over 16 dlrs Nazer was quoted by the SPA as saying Saudi Arabia’s adherence to the accord was shown clearly in the oil market He said contacts among members of OPEC showed they all wanted to stick to the accord In Jamaica OPEC President Rilwanu Lukman who is also Nigerian Oil Minister said the group planned to stick with the pricing agreement We are aware of the negative forces trying to manipulate the operations of the market but we are satisfied that the fundamentals exist for stable market conditions he said Kuwait’s Oil Minister Sheikh Ali al Khalifa al Sabah said in remarks published in the emirate’s daily Al Qabas there were no plans for an emergency OPEC meeting to review prices Traders and analysts in international oil markets estimate OPEC is producing up to one mln bpd above the 15.8 mln ceiling They named Kuwait and the United Arab Emirates along with the much smaller producer Ecuador among those producing above quota Sheikh Ali denied that Kuwait was over producing REUTER

Moving on

The package includes many features that go beyond this vignette. It is a key aim in the project to develop respective documentation in the vignette and the man pages for the individual functions further. Feedback is very welcome!

Annex: Setting the CORPUS_REGISTRY environment variable

The environment variable “CORPUS_REGISTRY” can be set as follows in R:

Sys.setenv(CORPUS_REGISTRY = "C:/PATH/TO/YOUR/REGISTRY")

To set the environment variable CORPUS_REGISTRY permanently, see the instructions R offer how to find the file ‘.Renviron’ or ‘.Renviron.site’ when calling the help for the startup process(?Startup).