[R] Histograms with strings, grouped by repeat count (w/ data)

Deepayan Sarkar deepayan.sarkar at gmail.com
Tue Jun 19 20:34:03 CEST 2007


On 6/18/07, Matthew Trunnell <trunnell at cognix.net> wrote:
> Aha!  So to expand that from the original expression,
>
> > table(table(d$filename, d$email_addr))
>
>   0   1   2   3
> 253  20   8   9
>
> I think that is exactly what I'm looking for.  I knew it must be
> simple!!!  What does the 0 column represent?

Number of unique filename:email_addr combinations that don't occur in the data.

> Also, does this tell me the same thing, filtered by Japan?
> > table(table(d$filename, d$email_addr, d$country_residence)[d$country_residence=="Japan"])
>
>   0   1   2   3
> 958   5   2   1

No it doesn't.

> length(table(d$filename, d$email_addr, d$country_residence))
[1] 4350
> length(d$country_residence)
[1] 63

You are using an index that is meaningless.


There's an alternative tabulation function that uses a formula
interface similar to that used in modeling functions; this might be
more transparent for your case:

> count <-
+     xtabs(~filename + email_addr, data = d,
+           subset = country_residence == "Japan")
> xtabs(~count)
count
  0   1   3
284   2   4


> How does that differ logically from this?
>
> > table(table(d$filename, d$email_addr)[d$country_residence=="Japan"])
>
>  0  1  2  3
> 51  4  2  1

This is also using meaningless indexing.

Note, incidentally, that you are indexing a matrix of dimension 10x29
as if it were a vector of length 290, which is probably not what you
meant to do anyway:

> str(table(d$filename, d$email_addr))
 'table' int [1:10, 1:29] 1 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "file1" "file10" "file2" "file3" ...
  ..$ : chr [1:29] "email1" "email10" "email11" "email12" ...

You need to read help(Extract) carefully and play around with some
simple examples.

> I don't understand why that produces different results.  The first one
> adds a third dimension to the table, but limits that third dimension
> to a single element, Japan.  Shouldn't it be the same?  And again,
> what's that zero column?

As before, they are the empty combinations.

-Deepayan



More information about the R-help mailing list