[R] new user question on dataframe comparisons and plots

Conor Robinson conor.robinson at gmail.com
Wed Aug 1 23:12:50 CEST 2007

I'm coming from the scipy community and have been using R on and for
the past week or so.  I'm still feeling out the language structure,
but so far so good.  I apologize in advance if I pose any obvious
questions, due to my current lack of diction when searching for my
issue, or recognizing it if I did see it.

Question 1, plots:

I have a data frame with 4 type factor columns, also in the data frame
I have one single, type logical column with the response data (T or
F).  I would like to plot a 4*4 grid showing all the two way attribute
interactions like with plot(data.frame) or pairs(data.frame,
panel=panel.smooth), however show the response's True and False as
different colors, or any other built in graphical analysis that might
be relevant in this case.  I'm sure this is simple since this is a
common procedure, thanks in advance for humoring me.  Also, what is
the correct term for this type of plot?

Question 2, data frame analysis:

I have two sub data frames split by whether my logical column is T or
F.  I want to compare the same factor column between both of the two
sub data frames (there are a few hundred different unique possibles
for this factor column eg AAAA - ZZZZ enumerated).  I've used table()
on the attribute columns from each sub frame to get counts.

pos <- data.frame(table(df.true$CAT))

AAAA  10

neg <- data.frame(table(df.false$CAT))

AAAA 1000

The TRUE sub frame has less unique factors that the sub frame FALSE, I
would like an output data frame that is one column all the factors
from the TRUE sub frame and the second column the counts from the TRUE
attributes / counts from the corresponding FALSE attributes ie
%response for each represented factor.  It's fine (better even) if all
factors are included and there is just a zero for the attributes with
no TRUEs.

I've been going off making my own function and running into trouble
with the data frame not being a vector etc etc, but I have a feeling
there is a *much* better way ie built in function, but I've hit my
current level of R understanding.

Thank you,

More information about the R-help mailing list