[R] new user question on dataframe comparisons and plots
Stephen Tucker
brown_emu at yahoo.com
Thu Aug 2 07:55:14 CEST 2007
Hi Conor,
I hope I interpreted your question correctly. I think for the first one you
are looking for a conditioning plot? I am going to create and use some
nonsensical data - 'iris' comes with R so this should be reproducible on your
machine:
library(lattice)
data(iris)
x <- iris
# make some factors using cut()
x[,2:3] <- lapply(x[,2:3],cut,3)
# add column of TRUE FALSE
x <- cbind(x,TF=sample(c(TRUE,FALSE),nrow(x),replace=TRUE))
xyplot(petal.wid~petal.len | ## these are numeric
sepal.wid*sepal.len, ## these are factors
groups=TF, ## TRUE or FALSE
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.loess(x,y,...)
},
data=x,auto.key=TRUE)
merge() should work when you have different factors, when you specify
all=TRUE.
## get counts for TRUE and FALSE
> y <- tapply(x$species,INDEX=x$TF,
+ function(x) as.data.frame(table(x)))
## merge results
> (z <- `names<-`(merge(y$`TRUE`,y$`FALSE`,by="x",all=TRUE),
+ c("factor","true","false")))
factor true false
1 versicolor 29 21
2 virginica 23 27
## reshape the data frame
> library(reshape)
> melt(z,id=1)
factor variable value
1 versicolor true 29
2 virginica true 23
3 versicolor false 21
4 virginica false 27
Hope this helps. If it doesn't you can post a small (reproducible) piece of
data and we can maybe help you out a little better...
Best regards,
ST
--- Conor Robinson <conor.robinson at gmail.com> wrote:
> I'm coming from the scipy community and have been using R on and for
> the past week or so. I'm still feeling out the language structure,
> but so far so good. I apologize in advance if I pose any obvious
> questions, due to my current lack of diction when searching for my
> issue, or recognizing it if I did see it.
>
> Question 1, plots:
>
> I have a data frame with 4 type factor columns, also in the data frame
> I have one single, type logical column with the response data (T or
> F). I would like to plot a 4*4 grid showing all the two way attribute
> interactions like with plot(data.frame) or pairs(data.frame,
> panel=panel.smooth), however show the response's True and False as
> different colors, or any other built in graphical analysis that might
> be relevant in this case. I'm sure this is simple since this is a
> common procedure, thanks in advance for humoring me. Also, what is
> the correct term for this type of plot?
>
>
> Question 2, data frame analysis:
>
> I have two sub data frames split by whether my logical column is T or
> F. I want to compare the same factor column between both of the two
> sub data frames (there are a few hundred different unique possibles
> for this factor column eg AAAA - ZZZZ enumerated). I've used table()
> on the attribute columns from each sub frame to get counts.
>
> pos <- data.frame(table(df.true$CAT))
>
> AAAA 10
> BASD 0
> ZAQM 4
> ...
>
> neg <- data.frame(table(df.false$CAT))
>
> AAAA 1000
> BASD 3
> ZAQM 9
> PPWS 10
> ...
>
> The TRUE sub frame has less unique factors that the sub frame FALSE, I
> would like an output data frame that is one column all the factors
> from the TRUE sub frame and the second column the counts from the TRUE
> attributes / counts from the corresponding FALSE attributes ie
> %response for each represented factor. It's fine (better even) if all
> factors are included and there is just a zero for the attributes with
> no TRUEs.
>
> I've been going off making my own function and running into trouble
> with the data frame not being a vector etc etc, but I have a feeling
> there is a *much* better way ie built in function, but I've hit my
> current level of R understanding.
>
> Thank you,
> Conor
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list