[R] Histograms with strings, grouped by repeat count (w/ data)
Deepayan Sarkar
deepayan.sarkar at gmail.com
Tue Jun 19 20:34:03 CEST 2007
On 6/18/07, Matthew Trunnell <trunnell at cognix.net> wrote:
> Aha! So to expand that from the original expression,
>
> > table(table(d$filename, d$email_addr))
>
> 0 1 2 3
> 253 20 8 9
>
> I think that is exactly what I'm looking for. I knew it must be
> simple!!! What does the 0 column represent?
Number of unique filename:email_addr combinations that don't occur in the data.
> Also, does this tell me the same thing, filtered by Japan?
> > table(table(d$filename, d$email_addr, d$country_residence)[d$country_residence=="Japan"])
>
> 0 1 2 3
> 958 5 2 1
No it doesn't.
> length(table(d$filename, d$email_addr, d$country_residence))
[1] 4350
> length(d$country_residence)
[1] 63
You are using an index that is meaningless.
There's an alternative tabulation function that uses a formula
interface similar to that used in modeling functions; this might be
more transparent for your case:
> count <-
+ xtabs(~filename + email_addr, data = d,
+ subset = country_residence == "Japan")
> xtabs(~count)
count
0 1 3
284 2 4
> How does that differ logically from this?
>
> > table(table(d$filename, d$email_addr)[d$country_residence=="Japan"])
>
> 0 1 2 3
> 51 4 2 1
This is also using meaningless indexing.
Note, incidentally, that you are indexing a matrix of dimension 10x29
as if it were a vector of length 290, which is probably not what you
meant to do anyway:
> str(table(d$filename, d$email_addr))
'table' int [1:10, 1:29] 1 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "file1" "file10" "file2" "file3" ...
..$ : chr [1:29] "email1" "email10" "email11" "email12" ...
You need to read help(Extract) carefully and play around with some
simple examples.
> I don't understand why that produces different results. The first one
> adds a third dimension to the table, but limits that third dimension
> to a single element, Japan. Shouldn't it be the same? And again,
> what's that zero column?
As before, they are the empty combinations.
-Deepayan
More information about the R-help
mailing list