[R] closeness of codes
jim at bitwrit.com.au
Tue Sep 20 11:15:46 CEST 2011
On 09/19/2011 04:46 PM, Henri-Paul Indiogine wrote:
> I am using the R library RQDA to assign certain codes to paragraphs of
> documents in a collection. Several paragraphs are assigned more than
> 1 code. E.g. often the codes "poverty" and "education" will be
> assigned to the same paragraph. Often also "math" and "career" will
> be given to the same paragraphs. Other codes are never given to the
> same paragraphs.
> I would like to calculate the relationship or "closeness" of certain
> codes. RQDA will generate a cross-codes table. It has the form of an
> upper triangular matrix where the upper triangle has the number of
> cross occurrences of 2 codes at their intersection. The lower
> triangle is filled with NA. The diagonal simply has the number of
> occurrences of the codes by themselves.
> The row names are the names of the codes and the column names are the
> IDs of the codes. E.g.
> 1 2 3 4
> code1 3 0 2 1
> code2 NA 4 1 0
> code3 NA NA 2 0
> code4 NA NA NA 3
> We can see that code1 is associated 2 out of 3 times with code3.
> Code2 is present 1 out of 4 times with code3. Code2 is never assigned
> to the same paragraph as Code1 and Code4 are, and so on.
> I am trying to understand how to create some sort of graph or diagram
> to represent this. Should I use a cluster diagram or a network graph?
> Also, what sort of R code could I use?
The intersectDiagram function in the plotrix package displays the
intersections of sets as rectangles with widths (and areas) proportional
to the number of members of each set intersection. This may be a way for
you to represent your codes. For your example, you could proceed like
this. Create a file ("hp.csv")containing the following:
intersectDiagram(hp,main="Combinations of codes")
There are other ways to represent your original data that
intersectDiagram will read in that you might like to try.
More information about the R-help