[R] (no subject)
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Sep 20 18:13:49 CEST 2012
On Thu, Sep 20, 2012 at 10:57 AM, Stefan Th. Gries <stgries at gmail.com> wrote:
> >From my book on corpus linguistics with R:
>
> # (10) Imagine you have two vectors a and b such that
> a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
> b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")
>
> # Of these vectors, you can create frequency lists by writing
> freq.list.a<-table(a); freq.list.b<-table(b)
> rm(a); rm(b)
>
> # How do you merge these two frequency lists without merging the two
> vectors first? More specifically, if I delete a and b from your
> memory,
> rm(a); rm(b)
> # how do you generate the following table only from freq.list.a and
> freq.list.b, i.e., without any reference to a and b themselves? Before
> you complain about this question as being unrealistic, consider the
> possibility that you generated the frequency lists of two corpora
> (here, a and b) that are so large that you cannot combine them into
> one (a.and.b<-c(a, b)) and generate a frequency list of that combined
> vector (table(a.and.b)) ...
> joint.freqs
> a b d e f g i j
> 3 1 3 1 5 5 1 1
>
> joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
> names(freq.list.b)))))) # You generate an empty vector joint.freqs (i)
> that is as long as there are different types in both a and b (but note
> that, as requested, this information is not taken from a or b, but
> from their frequency lists) ...
> names(joint.freqs)<-sort(unique(c(names(freq.list.a),
> names(freq.list.b)))) # ... and (ii) whose elements have these
> different types as names.
> joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
> vector joint.freqs that have the same names as the frequencies in the
> first frequency list are assigned the respective frequencies.
> joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
> # The elements of the new vector joint.freqs that have the same names
> as the frequencies in the second frequency list are assigned the sum
> of the values they already have (either the ones from the first
> frequency list or just zeroes) and the respective frequencies.
> joint.freqs # look at the result
>
> # Another shorter and more elegant solution was proposed by Claire
> Crawford (but uses a function which will only be introduced later in
> the book)
> freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
> lists are merged into a single vector ...
> joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
> sum)) # ... and then the sums of all numbers that share the same names
> are computed
> joint.freqs # look at the result
>
> # The shortest, but certainly not memory-efficient way to do this
> involves just using the frequency lists to create one big vector with
> all elements and tabulate that.
> table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
> freq.list.b))) # kind of cheating but possible with short vectors ...
>
Try:
rowsum(freq.list.a.b, names(freq.list.a.b))
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list