[R] help with statistics in R - how to measure the effect of users in groups
Petr PIKAL
petr.pikal at precheza.cz
Tue Oct 11 07:54:16 CEST 2011
Hi
>
> OK. So my original advice and warnings are correct.
>
> However, now there is an additional wrinkle because your response is a
> count, which is not a continuous measurement. For this, you'll need
glm(...,
> family = "poisson") instead of lm(...), where the ... is the stuff I
gave
> you before. A backup approach is there aren't too many small counts
(below
> about 10, say) is to take the square root of the counts and analyze that
via
> lm().
>
> In either approach, your interpretation becomes more difficult -- e.g.
have
> you any experience with glm's = generalized linear models? Moreover, if
> there are large numbers of users -- e.g. > dozens (and you may have
hundreds
> or thousands -- of course the interaction will be significant, but so
what?
> For this you'll need to re-frame the question.
>
> So given all this and what appears to be your relative ignorance of
> statistics, I strongly recommend that you get local statistical help. Or
> just forget about formal statistical analysis altogether and do some
> sensible plotting.
what was actually my advice too
> >>> > > library(ggplot2)
> >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users))
> >>> > > p+geom_point()
Regards
Petr
>
> Finally, that's it for me on this. I will offer you no more advice.
>
> -- Bert
>
> On Mon, Oct 10, 2011 at 9:40 AM, gj <gawesh at gmail.com> wrote:
>
> > Hi Bert,
> >
> > The real situation is like what you suggested, user x group
interactions.
> > The users can be in more than one group.
> > In fact, the data that I am trying to analyse constitute of users,
online
> > forums as groups and the attribute under measure is the number of
posts made
> > by each user in a particular forum.
> >
> > My hypothesis is that the number of posts a user makes to a forum is
> > dependent on the forum. For example if the user is in a forum that is
active
> > he contributes more compared to when he is in a forum that is less
active. I
> > guess there will be some users who contribute the same irrespective of
the
> > forum.
> >
> > I hope this makes sense.
> >
> > Regards
> > Gawesh
> >
> > On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter
<gunter.berton at gene.com>wrote:
> >
> >> Yes, of course. But then one gets into additional problems with
carryover
> >> effects,etc.
> >> Also, one then has a repeated measures problem (User is the
experimental
> >> unit) and my previous advice is nonsense,
> >>
> >> Like you, I have no idea what his real situation is.
> >>
> >> -- Bert
> >>
> >>
> >> On Mon, Oct 10, 2011 at 8:39 AM, Anupam <anupamtg at gmail.com> wrote:
> >>
> >>> It is possible to give multiple treatments, one at a time, to same
pool
> >>> of patients. You are correct that interactions may be important in
this
> >>> problem. I am only trying to help him frame the problem using an
analogy.
> >>> ****
> >>>
> >>> ** **
> >>>
> >>> Anupam.****
> >>>
> >>> *From:* Bert Gunter [mailto:gunter.berton at gene.com]
> >>> *Sent:* Monday, October 10, 2011 8:21 PM
> >>> *To:* Anupam
> >>> *Cc:* gj
> >>> *Subject:* Re: [R] help with statistics in R - how to measure the
effect
> >>> of users in groups****
> >>>
> >>> ** **
> >>>
> >>> If that is the case, and each user can appear in only one group,
there is
> >>> no group x user interaction, the poster's question was nonsense, and
one
> >>> analyzes the group effect only, as originally shown
> >>>
> >>> -- Bert****
> >>>
> >>> On Mon, Oct 10, 2011 at 7:43 AM, Anupam <anupamtg at gmail.com>
wrote:****
> >>>
> >>> Groups are different treatments given to Users for your Outcome
> >>> (measurement) of interest. Take this idea forward and you will have
an
> >>> answer.
> >>>
> >>> Anupam.
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org [
mailto:r-help-bounces at r-project.org]
> >>> On
> >>> Behalf Of Bert Gunter
> >>> Sent: Monday, October 10, 2011 7:36 PM
> >>> To: gj
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] help with statistics in R - how to measure the
effect of
> >>> users in groups
> >>>
> >>> Assuming your data are in a data frame, yourdat, as:
> >>>
> >>> User Group Value
> >>> u1 1 !0
> >>> u2 2 5
> >>> u3 3 NA
> >>> ...(etc)
> >>>
> >>> where Group is **explicitly coerced to be a factor,** then you want
the
> >>> User
> >>> x Group interaction, obtained from
> >>>
> >>> lm( Value ~ Group*User,data = yourdat)
> >>>
> >>> However, you'll get some kind of warning message if
> >>>
> >>> a) Not all Group x User combinations are present in the data
> >>>
> >>> b) Moreover, no statistics can be calculated if there are no
replicates
> >>> of
> >>> UserxGroup combinations.
> >>>
> >>> If you do not know why either of these are the case, get local help
or
> >>> study
> >>> any linear models (regression) text or online tutorial, as these
last
> >>> issues
> >>> have nothing to do with R.
> >>>
> >>> -- Bert
> >>>
> >>>
> >>> On Mon, Oct 10, 2011 at 3:48 AM, gj <gawesh at gmail.com> wrote:
> >>>
> >>> > Thanks Petr. I will try it on the real data.
> >>> >
> >>> > But that will only show that the groups are different or not.
> >>> > Is there any way I can test if the users are different when they
are
> >>> > in different groups?
> >>> >
> >>> > Regards
> >>> > Gawesh
> >>> >
> >>> > On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL
<petr.pikal at precheza.cz>
> >>> > wrote:
> >>> >
> >>> > > >
> >>> > > > Hi Petr,
> >>> > > >
> >>> > > > It's not an equation. It's my mistake; the * are meant to be
field
> >>> > > > separators for the example data. I should have just use blank
> >>> > > > spaces as
> >>> > > > follows:
> >>> > > >
> >>> > > > users Group1 Group2 Group3
> >>> > > > u1 10 5 N/A
> >>> > > > u2 6 N/A 4
> >>> > > > u3 5 2 3
> >>> > > >
> >>> > > >
> >>> > > > Regards
> >>> > > > Gawesh
> >>> > >
> >>> > > OK. You shall transform your data to long format to use lm
> >>> > >
> >>> > > test <- read.table("clipboard", header=T, na.strings="N/A")
> >>> > > test.m<-melt(test)
> >>> > > Using users as id variables
> >>> > > fit<-lm(value~variable, data=test.m)
> >>> > > summary(fit)
> >>> > >
> >>> > > Call:
> >>> > > lm(formula = value ~ variable, data = test.m)
> >>> > >
> >>> > > Residuals:
> >>> > > 1 2 3 4 6 8 9
> >>> > > 3.0 -1.0 -2.0 1.5 -1.5 0.5 -0.5
> >>> > >
> >>> > > Coefficients:
> >>> > > Estimate Std. Error t value Pr(>|t|)
> >>> > > (Intercept) 7.000 1.258 5.563 0.00511 **
> >>> > > variableGroup2 -3.500 1.990 -1.759 0.15336
> >>> > > variableGroup3 -3.500 1.990 -1.759 0.15336
> >>> > > ---
> >>> > > Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> >>> > >
> >>> > > Residual standard error: 2.179 on 4 degrees of freedom
> >>> > > (2 observations deleted due to missingness)
> >>> > > Multiple R-squared: 0.525, Adjusted R-squared: 0.2875
> >>> > > F-statistic: 2.211 on 2 and 4 DF, p-value: 0.2256
> >>> > >
> >>> > > No difference among groups, but I am not sure if this is the
correct
> >>> > > way to evaluate.
> >>> > >
> >>> > > library(ggplot2)
> >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users))
> >>> > > p+geom_point()
> >>> > >
> >>> > > There is some sign that user3 has lowest value in each group.
> >>> > > However for including users to fit there is not enough data.
> >>> > >
> >>> > > Regards
> >>> > > Petr
> >>> > >
> >>> > >
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL
> >>> > > > <petr.pikal at precheza.cz>
> >>> > > wrote:
> >>> > > >
> >>> > > > > Hi
> >>> > > > >
> >>> > > > > I do not understand much about your equations. I think you
shall
> >>> > > > > look
> >>> > > to
> >>> > > > > Practical Regression and Anova Using R from J.Faraway.
> >>> > > > >
> >>> > > > > Having data frame DF with columns - users, groups, results
you
> >>> > > > > could
> >>> > > do
> >>> > > > >
> >>> > > > > fit <- lm(results~groups, data = DF)
> >>> > > > >
> >>> > > > > Regards
> >>> > > > > Petr
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > >
> >>> > > > > > Hi,
> >>> > > > > >
> >>> > > > > > I'm a newbie to R. My knowledge of statistics is mostly
> >>> > self-taught.
> >>> > > My
> >>> > > > > > problem is how to measure the effect of users in groups. I
can
> >>> > > calculate
> >>> > > > > a
> >>> > > > > > particular attribute for a user in a group. But my
hypothesis
> >>> > > > > > is
> >>> > > that
> >>> > > > > the
> >>> > > > > > user's attribute is not independent of each other and that
the
> >>> > > user's
> >>> > > > > > attribute depends on the group ie that user's behaviour
change
> >>> > based
> >>> > > on
> >>> > > > > the
> >>> > > > > > group.
> >>> > > > > >
> >>> > > > > > Let me give an example:
> >>> > > > > >
> >>> > > > > > users*Group 1*Group 2*Group 3
> >>> > > > > > u1*10*5*n/a
> >>> > > > > > u2*6*n/a*4
> >>> > > > > > u3*5*2*3
> >>> > > > > >
> >>> > > > > > For example, I want to be able to prove that u1 behaviour
is
> >>> > > different
> >>> > > > > in
> >>> > > > > > group 1 than other groups and the particular thing about
Group
> >>> > > > > > 1 is
> >>> > > that
> >>> > > > > > users in Group 1 tend to have a higher value of the
attribute
> >>> > > > > > under measurement.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Hence, can use R to test my hypothesis. I'm willing to
learn;
> >>> > > > > > so if
> >>> > > this
> >>> > > > > is
> >>> > > > > > very simple, just point me in the direction of any online
> >>> > > > > > resources
> >>> > > > > about
> >>> > > > > > it. At the moment, I don't even how to define these class
of
> >>> > > problems?
> >>> > > > > That
> >>> > > > > > will be a start.
> >>> > > > > >
> >>> > > > > > Regards
> >>> > > > > > Gawesh
> >>> > > > > >
> >>> > > > > > [[alternative HTML version deleted]]
> >>> > > > > >
> >>> > > > > > ______________________________________________
> >>> > > > > > R-help at r-project.org mailing list
> >>> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > > > > > PLEASE do read the posting guide
> >>> > > > > http://www.R-project.org/posting-guide.html
> >>> > > > > > and provide commented, minimal, self-contained,
reproducible
> >>> code.
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > > > [[alternative HTML version deleted]]
> >>> > > >
> >>> > > > ______________________________________________
> >>> > > > R-help at r-project.org mailing list
> >>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > > > PLEASE do read the posting guide
> >>> > > http://www.R-project.org/posting-guide.html
> >>> > > > and provide commented, minimal, self-contained, reproducible
code.
> >>> > >
> >>> > >
> >>> >
> >>> > [[alternative HTML version deleted]]
> >>> >
> >>> >
> >>> > ______________________________________________
> >>> > R-help at r-project.org mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> > http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>> >
> >>> >
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ****
> >>>
> >>> ** **
> >>>
> >>
> >>
> >>
> >> --
> >> "Men by nature long to get on to the ultimate truths, and will often
be
> >> impatient with elementary studies or fight shy of them. If it were
possible
> >> to reach the ultimate truths without the elementary studies usually
prefixed
> >> to them, these would not be preparatory studies but superfluous
diversions."
> >>
> >> -- Maimonides (1135-1204)
> >>
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >> 467-7374
> >>
> >>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
> >>
> >>
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list