[R] help with statistics in R - how to measure the effect of users in groups

Petr PIKAL petr.pikal at precheza.cz
Tue Oct 11 07:54:16 CEST 2011


Hi

> 
> OK. So my original advice and warnings are correct.
> 
> However, now there is an additional wrinkle because your response is a
> count, which is not a continuous measurement. For this, you'll need 
glm(...,
> family = "poisson") instead of lm(...), where the ... is the stuff I 
gave
> you before. A backup approach is there aren't too many small counts 
(below
> about 10, say) is to take the square root of the counts and analyze that 
via
> lm().
> 
> In either approach, your interpretation becomes more difficult -- e.g. 
have
> you any experience with glm's = generalized linear models? Moreover, if
> there are large numbers of users -- e.g. > dozens (and you may have 
hundreds
> or thousands -- of course the interaction will be significant, but so 
what?
> For this you'll need to re-frame the question.
> 
> So given all this and what appears to be your relative ignorance of
> statistics, I strongly recommend that you get local statistical help. Or
> just forget about formal statistical analysis altogether and do some
> sensible plotting.

what was actually my advice too

> >>> > > library(ggplot2)
> >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users))
> >>> > > p+geom_point()

Regards
Petr


> 
> Finally, that's it for me on this. I will offer you no more advice.
> 
> -- Bert
> 
> On Mon, Oct 10, 2011 at 9:40 AM, gj <gawesh at gmail.com> wrote:
> 
> > Hi Bert,
> >
> > The real situation is like what you suggested, user x group 
interactions.
> > The users can be in more than one group.
> > In fact, the data that I am trying to analyse constitute of users, 
online
> > forums as groups and the attribute under measure is the number of 
posts made
> > by each user in a particular forum.
> >
> > My hypothesis is that the number of posts a user makes to a forum is
> > dependent on the forum. For example if the user is in a forum that is 
active
> > he contributes more compared to when he is in a forum that is less 
active. I
> > guess there will be some users who contribute the same irrespective of 
the
> > forum.
> >
> > I hope this makes sense.
> >
> > Regards
> > Gawesh
> >
> > On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter 
<gunter.berton at gene.com>wrote:
> >
> >> Yes, of course. But then one gets into additional problems with 
carryover
> >> effects,etc.
> >> Also, one then has a repeated measures problem (User is the 
experimental
> >> unit) and my previous advice is nonsense,
> >>
> >> Like you, I have no idea what his real situation is.
> >>
> >> -- Bert
> >>
> >>
> >> On Mon, Oct 10, 2011 at 8:39 AM, Anupam <anupamtg at gmail.com> wrote:
> >>
> >>> It is possible to give multiple treatments, one at a time, to same 
pool
> >>> of patients. You are correct that interactions may be important in 
this
> >>> problem. I am only trying to help him frame the problem using an 
analogy.
> >>> ****
> >>>
> >>> ** **
> >>>
> >>> Anupam.****
> >>>
> >>> *From:* Bert Gunter [mailto:gunter.berton at gene.com]
> >>> *Sent:* Monday, October 10, 2011 8:21 PM
> >>> *To:* Anupam
> >>> *Cc:* gj
> >>> *Subject:* Re: [R] help with statistics in R - how to measure the 
effect
> >>> of users in groups****
> >>>
> >>> ** **
> >>>
> >>> If that is the case, and each user can appear in only one group, 
there is
> >>> no group x user interaction, the poster's question was nonsense, and 
one
> >>> analyzes the group effect only, as originally shown
> >>>
> >>> -- Bert****
> >>>
> >>> On Mon, Oct 10, 2011 at 7:43 AM, Anupam <anupamtg at gmail.com> 
wrote:****
> >>>
> >>> Groups are different treatments given to Users for your Outcome
> >>> (measurement) of interest. Take this idea forward and you will have 
an
> >>> answer.
> >>>
> >>> Anupam.
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org [
mailto:r-help-bounces at r-project.org]
> >>> On
> >>> Behalf Of Bert Gunter
> >>> Sent: Monday, October 10, 2011 7:36 PM
> >>> To: gj
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] help with statistics in R - how to measure the 
effect of
> >>> users in groups
> >>>
> >>> Assuming your data are in a data frame, yourdat,  as:
> >>>
> >>> User   Group   Value
> >>> u1     1          !0
> >>> u2     2         5
> >>> u3      3      NA
> >>> ...(etc)
> >>>
> >>> where Group is **explicitly coerced to be a factor,** then you want 
the
> >>> User
> >>> x Group interaction, obtained from
> >>>
> >>> lm( Value ~ Group*User,data = yourdat)
> >>>
> >>> However, you'll get some kind of warning message if
> >>>
> >>> a) Not all Group x User combinations are present in the data
> >>>
> >>> b) Moreover, no statistics can be calculated if there are no 
replicates
> >>> of
> >>> UserxGroup combinations.
> >>>
> >>> If you do not know why either of these are the case, get local help 
or
> >>> study
> >>> any linear models (regression) text or online tutorial, as these 
last
> >>> issues
> >>> have nothing to do with R.
> >>>
> >>> -- Bert
> >>>
> >>>
> >>> On Mon, Oct 10, 2011 at 3:48 AM, gj <gawesh at gmail.com> wrote:
> >>>
> >>> > Thanks Petr. I will try it on the real data.
> >>> >
> >>> > But that will only show that the groups are different or not.
> >>> > Is there any way I can test if the users are different when they 
are
> >>> > in different groups?
> >>> >
> >>> > Regards
> >>> > Gawesh
> >>> >
> >>> > On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL 
<petr.pikal at precheza.cz>
> >>> > wrote:
> >>> >
> >>> > > >
> >>> > > > Hi Petr,
> >>> > > >
> >>> > > > It's not an equation. It's my mistake; the * are meant to be 
field
> >>> > > > separators for the example data. I should have just use blank
> >>> > > > spaces as
> >>> > > > follows:
> >>> > > >
> >>> > > > users   Group1   Group2   Group3
> >>> > > > u1        10           5            N/A
> >>> > > > u2         6          N/A          4
> >>> > > > u3         5           2            3
> >>> > > >
> >>> > > >
> >>> > > > Regards
> >>> > > > Gawesh
> >>> > >
> >>> > > OK. You shall transform your data to long format to use lm
> >>> > >
> >>> > > test <- read.table("clipboard", header=T, na.strings="N/A")
> >>> > > test.m<-melt(test)
> >>> > > Using users as id variables
> >>> > > fit<-lm(value~variable, data=test.m)
> >>> > > summary(fit)
> >>> > >
> >>> > > Call:
> >>> > > lm(formula = value ~ variable, data = test.m)
> >>> > >
> >>> > > Residuals:
> >>> > >   1    2    3    4    6    8    9
> >>> > >  3.0 -1.0 -2.0  1.5 -1.5  0.5 -0.5
> >>> > >
> >>> > > Coefficients:
> >>> > >               Estimate Std. Error t value Pr(>|t|)
> >>> > > (Intercept)       7.000      1.258   5.563 0.00511 **
> >>> > > variableGroup2   -3.500      1.990  -1.759 0.15336
> >>> > > variableGroup3   -3.500      1.990  -1.759 0.15336
> >>> > > ---
> >>> > > Signif. codes:  0  ***  0.001  **  0.01  *  0.05  .  0.1     1
> >>> > >
> >>> > > Residual standard error: 2.179 on 4 degrees of freedom
> >>> > >  (2 observations deleted due to missingness)
> >>> > > Multiple R-squared: 0.525,      Adjusted R-squared: 0.2875
> >>> > > F-statistic: 2.211 on 2 and 4 DF,  p-value: 0.2256
> >>> > >
> >>> > > No difference among groups, but I am not sure if this is the 
correct
> >>> > > way to evaluate.
> >>> > >
> >>> > > library(ggplot2)
> >>> > > p<-ggplot(test.m, aes(x=variable, y=value, colour=users))
> >>> > > p+geom_point()
> >>> > >
> >>> > > There is some sign that user3 has lowest value in each group.
> >>> > > However for including users to fit there is not enough data.
> >>> > >
> >>> > > Regards
> >>> > > Petr
> >>> > >
> >>> > >
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL
> >>> > > > <petr.pikal at precheza.cz>
> >>> > > wrote:
> >>> > > >
> >>> > > > > Hi
> >>> > > > >
> >>> > > > > I do not understand much about your equations. I think you 
shall
> >>> > > > > look
> >>> > > to
> >>> > > > > Practical Regression and Anova Using R from J.Faraway.
> >>> > > > >
> >>> > > > > Having data frame DF with columns - users, groups, results 
you
> >>> > > > > could
> >>> > > do
> >>> > > > >
> >>> > > > > fit <- lm(results~groups, data = DF)
> >>> > > > >
> >>> > > > > Regards
> >>> > > > > Petr
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > >
> >>> > > > > > Hi,
> >>> > > > > >
> >>> > > > > > I'm a newbie to R. My knowledge of statistics is mostly
> >>> > self-taught.
> >>> > > My
> >>> > > > > > problem is how to measure the effect of users in groups. I 
can
> >>> > > calculate
> >>> > > > > a
> >>> > > > > > particular attribute for a user in a group. But my 
hypothesis
> >>> > > > > > is
> >>> > > that
> >>> > > > > the
> >>> > > > > > user's attribute is not independent of each other and that 
the
> >>> > > user's
> >>> > > > > > attribute depends on the group ie that user's behaviour 
change
> >>> > based
> >>> > > on
> >>> > > > > the
> >>> > > > > > group.
> >>> > > > > >
> >>> > > > > > Let me give an example:
> >>> > > > > >
> >>> > > > > > users*Group 1*Group 2*Group 3
> >>> > > > > > u1*10*5*n/a
> >>> > > > > > u2*6*n/a*4
> >>> > > > > > u3*5*2*3
> >>> > > > > >
> >>> > > > > > For example, I want to be able to prove that u1 behaviour 
is
> >>> > > different
> >>> > > > > in
> >>> > > > > > group 1 than other groups and the particular thing about 
Group
> >>> > > > > > 1 is
> >>> > > that
> >>> > > > > > users in Group 1 tend to have a higher value of the 
attribute
> >>> > > > > > under measurement.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Hence, can use R to test my hypothesis. I'm willing to 
learn;
> >>> > > > > > so if
> >>> > > this
> >>> > > > > is
> >>> > > > > > very simple, just point me in the direction of any online
> >>> > > > > > resources
> >>> > > > > about
> >>> > > > > > it. At the moment, I don't even how to define these class 
of
> >>> > > problems?
> >>> > > > > That
> >>> > > > > > will be a start.
> >>> > > > > >
> >>> > > > > > Regards
> >>> > > > > > Gawesh
> >>> > > > > >
> >>> > > > > >    [[alternative HTML version deleted]]
> >>> > > > > >
> >>> > > > > > ______________________________________________
> >>> > > > > > R-help at r-project.org mailing list
> >>> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > > > > > PLEASE do read the posting guide
> >>> > > > > http://www.R-project.org/posting-guide.html
> >>> > > > > > and provide commented, minimal, self-contained, 
reproducible
> >>> code.
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > > >    [[alternative HTML version deleted]]
> >>> > > >
> >>> > > > ______________________________________________
> >>> > > > R-help at r-project.org mailing list
> >>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > > > PLEASE do read the posting guide
> >>> > > http://www.R-project.org/posting-guide.html
> >>> > > > and provide commented, minimal, self-contained, reproducible 
code.
> >>> > >
> >>> > >
> >>> >
> >>> >        [[alternative HTML version deleted]]
> >>> >
> >>> >
> >>> > ______________________________________________
> >>> > R-help at r-project.org mailing list
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> > http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>> >
> >>> >
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ****
> >>>
> >>> ** **
> >>>
> >>
> >>
> >>
> >> --
> >> "Men by nature long to get on to the ultimate truths, and will often 
be
> >> impatient with elementary studies or fight shy of them. If it were 
possible
> >> to reach the ultimate truths without the elementary studies usually 
prefixed
> >> to them, these would not be preparatory studies but superfluous 
diversions."
> >>
> >> -- Maimonides (1135-1204)
> >>
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >> 467-7374
> >>
> >> 
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
> >>
> >>
> >>
> >
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list