[R] (no subject)

Stefan Th. Gries stgries at gmail.com
Thu Sep 20 16:57:34 CEST 2012


>From my book on corpus linguistics with R:

# (10)   Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")

# Of these vectors, you can create frequency lists by writing
freq.list.a<-table(a); freq.list.b<-table(b)
rm(a); rm(b)

# How do you merge these two frequency lists without merging the two
vectors first? More specifically, if I delete a and b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1

joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.list.b)))))) # You generate an empty vector joint.freqs (i)
that is as long as there are different types in both a and b (but note
that, as requested, this information is not taken from a or b, but
from their frequency lists) ...
names(joint.freqs)<-sort(unique(c(names(freq.list.a),
names(freq.list.b)))) # ... and (ii) whose elements have these
different types as names.
joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
vector joint.freqs that have the same names as the frequencies in the
first frequency list are assigned the respective frequencies.
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
# The elements of the new vector joint.freqs that have the same names
as the frequencies in the second frequency list are assigned the sum
of the values they already have (either the ones from the first
frequency list or just zeroes) and the respective frequencies.
joint.freqs # look at the result

# Another shorter and more elegant solution was proposed by Claire
Crawford (but uses a function which will only be introduced later in
the book)
freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
lists are merged into a single vector ...
joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
sum)) # ... and then the sums of all numbers that share the same names
are computed
joint.freqs # look at the result

# The shortest, but certainly not memory-efficient way to do this
involves just using the frequency lists to create one big vector with
all elements and tabulate that.
table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
freq.list.b))) # kind of cheating but possible with short vectors ...

HTH,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries



More information about the R-help mailing list