[R] Rsquared for anova

Sun Apr 17 16:02:24 CEST 2011

( did this msg make it through the lists as rich text? hotmail
didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear
if your issues are with R or stats or both. Usually the hard
core stats people punt the stats questions to other places but
both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I 
always like looking at new data. If you have one or
a few dependent variable and many independent variable,
it would probably help if you could visualize a
surface with the response as a function of the input
variables and then, maybe with the input of prior information or
anecdotes, you have some idea what tests or
analyses would make sense. 

just some thoughts "for illustration only"

df<-read.table("results_processedCP.txt",header=T)

first it helps to make sure everything went ok and do quick
checks, for example, 

str(df)
unique(df$nh1)
unique(df$nh2)
unique(df$nh3)
unique(df$randsize)
unique(df$aweoghts)
unique(df$aweights)

now personally lots of binary variable confuse me and
I can munge them all together since I expect I can
later identify issues in following plots. So, with
this data you can create a composite variable like this,
( now I have not checked any of this for accuracy
and typos and other problems may render the results useless)

x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights
df2<-cbind(df,x)
str(df2)

not sure if "time" was an input or output but you could
see if there is any obvious trend or periodicity of
time with your new made up variable,

plot(df2$time,df2$x)

Apparently x is a num rather than int, it can be changed for illustration
but probably of no consequence,

xi=as.integer(x)
str(xi)

and then you can add color based on this varaiable, 

min(xi)
c=rainbow(56)
cx=c[xi+1]
str(cx)

and make color coded scatter plots. Now, if you 
got lucky and guessed right you may see some patterns
that you want to test, 

plot(df2$tos,df2$tws,col=cx)

in this case, I get a cool red-yellow-green line along bottom ( very
compelling linear fit question ) and scattered magenta( pink red? LOL ) and blue points
everywhere with cluster near origin and nothing in top right quadrant. 
Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed
in, if you just plot the red and green points the fit looks perfect for linear,

> good=which(df2$x<20)
> plot(df2$tos[good],df2$tws[good],col=cx[good])

now if you look at results of fit of "Good" points vs all points,
it isn't clear that anything like this would emerge from just
looking at summaries of a linear fit, 

td=df2$tos[good]
ti=df2$tws[good]
lm(td~ti)
lm(df2$tos~ df2$tws)
summary(lm(td~ti))
summary(lm(df2$tos~ df2$tws))

Now of course "tests" need to be considered ahead of time or else
it is easy to go shopping for the answer you want. Anything post hoc
needs to be very complete and you should at least try to rationalize
test results you don't happen to like ( assuming you are trying to understand
the system from which the data was measured rather than justify some
particular outcome). 

Date: Sun, 17 Apr 2011 11:34:14 +0200
From: dorien.herremans at ua.ac.be
To: dieter.menne at menne-biomed.de
CC: r-help at r-project.org
Subject: Re: [R] Rsquared for anova

Thanks for your remarks. I've been reading about R for the last two days,
but I don't really get when I should use lm or aov.

I have attached the dataset, feel free to take a look at it.

So far, running it with alle the combinations did not take too long and
there seem to be some effects between the parameters. However, 2x2
combinations might suffice.

Thanks for any help, or a pointer to some good documentation,

Dorien

On 16 April 2011 10:13, Dieter Menne <dieter.menne at menne-biomed.de> wrote:

>
> dorien wrote:
> >
> >> fit <- lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
> > data=expdata))
> > Error: unexpected ',' in "fit <-
> > lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,"
> >
> >
>
> Peter's point is the important one: too many interactions, and even with +
> instead of * you might be running into problems.
>
> But anyway: if you don't let us access
>
>
> /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP
>
> you cannot expect a better answer which will depend on the structure of the
> data set.
>
> Dieter
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dorien Herremans

*Department of Environment, Technology and Technology Management*
Faculty of Applied Economics
University of Antwerp

B.513
Prinsstraat 13
2000 Antwerp
Belgium
+32 3 265 41 25

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.