[R] RE: Giving a first good impression of R to Social Scientists

Sat Aug 14 06:49:11 CEST 2004

Dear Roland,

Have you looked at Zelig (http://gking.harvard.edu/zelig)?
Several professors in my department are going to use it to teach
R to political science undergraduates and graduate students this
fall.  We just presented it at the Political Methodology
meeting, and will present it again at the American Political
Science Association meeting, so we hope that other departments
will start to use Zelig as a teaching tool (for applied social
science in general, as an alternative to Stata, etc.).

Although a GUI would be good, students won't learn to use R that
way.  I think that the key to getting them to use the command
line interface is to draw an analogy between R and English (or
another language):  There are rules of syntax; here they are; if
you get a "syntax error", you should look for the following
common errors; here are some simple examples and demos that you
can/want to follow (because you're interested in the problem);
and here are the models in a logical format.

Social scientists aren't statisticians, but they're pretty
clever.  They probably had to learn at least one foreign
language in university, and they're probably pretty careful
writers in any language, so making R seem like just another
language will make R seem *easy* to use.

Yours,

Olivia Lau

> On Thu, 12 Aug 2004, Rau, Roland wrote:
> >
> > That is why would like to ask the experts on this list if
anyone of you has
> > encountered a similar experience and what you could advise
to persuade
> > people quickly that it is worth learning a new software?
>
> One problem is that it may not be true.  Unless these people
are going to
> be doing their own statistics in the future (which is probably
true only
> for a minority) they might actually be better off with a point
and click
> interface.  I'm (obviously) not arguing that SPSS is a better
statistical
> environment than R, but it is easier to learn, and in 10 or 15
weeks they
> may not get to see the benefits of R.
>
>
> -thomas
>
>
>
> ------------------------------
>
> Message: 12
> Date: Thu, 12 Aug 2004 16:24:28 +0100
> From: Barry Rowlingson <B.Rowlingson at lancaster.ac.uk>
> Subject: Re: [R] Giving a first good impression of R to Social
> Scientists
> To: "'r-help at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch>
> Message-ID: <411B8BAC.2050906 at lancaster.ac.uk>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Thomas Lumley wrote:
> > On Thu, 12 Aug 2004, Rau, Roland wrote:
> >
> >>That is why would like to ask the experts on this list if
anyone of you has
> >>encountered a similar experience and what you could advise
to persuade
> >>people quickly that it is worth learning a new software?
> >
>
>   The usual way of teaching R seems to be bottom-up. Here's
the command
> prompt, type some arithmetic, make some assignments, learn
about
> function calls and arguments, write your own functions, write
your own
> packages.
>
>   Perhaps a top-down approach might help certain cases. People
using
> point-n-click packages tend to use a limited range of
analyses. Write
> some functions that do these analyses, or give them wrappers
so that
> they get something like:
>
>   > myData = readDataFile("foo.dat")
>     Read 4 variables: Z, Age, Sex, Disease
>
>   > analyseThis(myData, response="Z", covariate="Age")
>
>    Z = 0.36 * Age, Significance level = 0.932
>
>   or whatever. Really spoon feed the things they need to do.
Make it
> really easy, foolproof.
>
>   Then show them what's behind the analyseThis() function. How
its not
> even part of the R distribution. How easy you made it for a
beginner to
> do a complex and novel analysis. Then maybe it'll "click" for
them, and
> they'll see how having a programming language behind their
statistics
> functions lets them explore in ways not thought possible with
the
> point-n-click paradigm. Perhaps they'll start editing
analyseThis() and
> write analyseThat(), start thinking for themselves.
>
>   Or maybe they'll just stare at you blankly...
>
> Baz
>
>
>
> ------------------------------
>
> Message: 13
> Date: Thu, 12 Aug 2004 08:28:18 -0700 (PDT)
> From: Jason Liao <jg_liao at yahoo.com>
> Subject: [R] truly object oriented programming in R
> To: r-help at stat.math.ethz.ch
> Message-ID:
<20040812152818.69617.qmail at web53706.mail.yahoo.com>
> Content-Type: text/plain; charset=us-ascii
>
> Good morning! I recently implemented a KD tree in JAVA for
faster
> kernel density estimation (part of the code follows). It went
well. To
> hook it with R, however, has proved more difficult. My
question is: is
> it possible to implement the algorithm in R? My impression
seems to
> indicate no as the code requires a complete class-object
framework that
> R does not support. But is there an R package or something
that may
> make it possible? Thanks in advance for your help.
>
> Java implementation of KD tree:
>
> public class Kdnode {
>
>         private double[] center; //center of the bounding box
>         private double diameter; //maximum distance from
center to
> anywhere within the bounding box
>         private int numOfPoints; //number of source data
points in the
> bounding box
>
>         private Kdnode left, right;
>
>
> public Kdnode(double[][] points, int split_dim, int [][]
> sortedIndices, double[][] bBox) {
>            //bBox: the bounding box, 1st row the lower bound,
2nd row
> the upper bound
>                 numOfPoints = points.length;
> int d = points[0].length;
>
>                 center = new double[d];
>                 for(int j=0; j<d; j++) center[j] =
> (bBox[0][j]+bBox[1][j])/2.;
>                 diameter = get_diameter(bBox);
>
> if(numOfPoints==1) {
>                   diameter = 0.;
>                   for(int j=0; j<d; j++) center[j] =
points[0][j];
>   left = null;
>   right = null;
> }
> else {
>                   int middlePoint =
> sortedIndices[split_dim][numOfPoints/2];
>   double splitValue = points[middlePoint][split_dim];
>
>                   middlePoint =
> sortedIndices[split_dim][numOfPoints/2-1];
>                   double splitValue_small =
> points[middlePoint][split_dim];
>
>   int left_size = numOfPoints/2;
>                   int right_size = numOfPoints - left_size;
>
>   double[][] leftPoints = new double[left_size][d];
>                   double[][] rightPoints = new
double[right_size][d];
>
>
>   int[][] leftSortedIndices = new int[d][left_size];
>   int[][] rightSortedIndices = new int[d][right_size];
>
>   int left_counter = 0, right_counter = 0;
>   int[] splitInfo = new int [numOfPoints];
>
>   for(int i = 0; i < numOfPoints; i++) {
>     if(points[i][split_dim] < splitValue) {
> for(int j=0; j<d; j++) leftPoints[left_counter][j] =
points[i][j];
>        splitInfo[i] = right_counter;
>                         left_counter++;
>                     }
>
>     else {
> for(int j=0; j<d; j++) rightPoints[right_counter][j] =
points[i][j];
> splitInfo[i] = left_counter;
>                         right_counter++;
>                     }
>                   }
> // modify appropriately the indices to correspond to the new
lists
> for(int i = 0; i < d; i++) {
> int left_index = 0, right_index = 0;
> for(int j = 0; j < numOfPoints; j++) {
> if(points[sortedIndices[i][j]][split_dim] < splitValue)
> leftSortedIndices[i][left_index++] = sortedIndices[i][j] -
> splitInfo[sortedIndices[i][j]];
> else    rightSortedIndices[i][right_index++] =
sortedIndices[i][j]
> - splitInfo[sortedIndices[i][j]];
>                                 }
> }
>
> // Recursively compute the kdnodes for the points in the two
> splitted spaces
> double[][] leftBBox = new double[2][];
> double[][] rightBBox = new double[2][];
>
>                         for(int i=0; i<2; i++) {
>                                 leftBBox[i] =
> (double[])bBox[i].clone();
>                                 rightBBox[i] =
> (double[])bBox[i].clone();
>                             }
>
>                         leftBBox[1][split_dim] =
splitValue_small;
>                         rightBBox[0][split_dim] = splitValue;
>
>                         int next_dim = (split_dim + 1) % (d);
> left = new Kdnode(leftPoints, next_dim, leftSortedIndices,
> leftBBox);
> right = new Kdnode(rightPoints, next_dim, rightSortedIndices,
> rightBBox);
> }
> }
>
>
>         public double evaluate(double[] target, double delta,
double
> bandwidth) throws Exception
>         {
>
>              double dis_2_center = Common.distance(target,
> center)/bandwidth;
>              double dm = diameter/bandwidth;
>
>              if(dis_2_center >= 1+dm) return 0.;
>              if(numOfPoints==1) return Common.K(dis_2_center);
>
>              /*if(dis_2_center<1)
>              {
>                  double temp2 =
dm*Common.KDeriv(dis_2_center);
>                  if(temp2<delta) return
> Common.K(dis_2_center)*numOfPoints;
>              } */
>
>              return left.evaluate(target,delta, bandwidth) +
> right.evaluate(target,delta, bandwidth);
>         }
>
>
>          public double get_diameter(double[][] bBox)
>         {
>             double value = 0., diff;
>             for (int i=0; i<bBox[0].length;i++)
>             {
>                 diff = (bBox[1][i] - bBox[0][i])/2.;
>                 value += diff*diff;
>             }
>             return Math.sqrt(value);
>         }
> }
>
> =====
> Jason Liao, http://www.geocities.com/jg_liao
> Dept. of Biostatistics, http://www2.umdnj.edu/bmtrxweb
> University of Medicine and Dentistry of New Jersey
> phone 732-235-5429, School of Public Health office
> phone 732-235-8611, Cancer Institute of New Jersey office
> moble phone 908-720-4205
>
>
>
> ------------------------------
>
> Message: 14
> Date: Thu, 12 Aug 2004 15:40:52 +0000 (UTC)
> From: Gabor Grothendieck <ggrothendieck at myway.com>
> Subject: Re: [R] truly object oriented programming in R
> To: r-help at stat.math.ethz.ch
> Message-ID: <loom.20040812T173739-400 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Jason Liao <jg_liao <at> yahoo.com> writes:
>
> :
> : Good morning! I recently implemented a KD tree in JAVA for
faster
> : kernel density estimation (part of the code follows). It
went well. To
> : hook it with R, however, has proved more difficult. My
question is: is
> : it possible to implement the algorithm in R? My impression
seems to
> : indicate no as the code requires a complete class-object
framework that
> : R does not support. But is there an R package or something
that may
> : make it possible? Thanks in advance for your help.
>
> R comes with the S3 and S4 object systems out-of-the-box and
there is an
> addon package oo.R available at:
>
>    http://www.maths.lth.se/help/R/R.classes/
>
> that provides a more conventional OO system.   Its likely that
one or more
> of these would satisfy your requirements.
>
>
>
> ------------------------------
>
> Message: 15
> Date: Thu, 12 Aug 2004 17:56:05 +0200
> From: "Kahra Hannu" <kahra at mpsgr.it>
> Subject: RE: [R] linear constraint optim with
bounds/reparametrization
> To: "Spencer Graves" <spencer.graves at pdf.com>, "Ingmar Visser"
> <i.visser at uva.nl>
> Cc: Thomas Lumley <tlumley at u.washington.edu>,
R-help at stat.math.ethz.ch
> Message-ID:
> <C9FC71F7E9356F40AFE2ACC2099DE14714963D at MAILSERVER-B.mpsgr.it>
> Content-Type: text/plain; charset="iso-8859-1"
>
> >From Spencer Graves:
>
> >However, for an equality constraint, I've had good luck by
with an objective function that adds something like the
> >following to my objective function:
constraintViolationPenalty*(A%*%theta-c)^2, where
"constraintViolationPenalty" is
> >passed via "..." in a call to optim.
>
> I applied Spencer's suggestion to a set of eight different
constrained portfolio optimization problems. It seems to give a
usable practice to solve the portfolio problem, when the QP
optimizer is not applicable. After all, practical portfolio
management is more an art than a science.
>
> >I may first run optim with a modest value for
constraintViolationPenalty then restart it with the output of
the
> >initial run as starting values and with a larger value for
constraintViolationPenalty.
>
> I wrote a loop that starts with a small value for the penalty
and stops when the change of the function value, when increasing
the penalty, is less than epsilon. I found that epsilon = 1e-06
provides a reasonable accuracy with respect to computational
time.
>
> Spencer, many thanks for your suggestion.
>
> Hannu Kahra
>
>
>
> ------------------------------
>
> Message: 16
> Date: Thu, 12 Aug 2004 17:59:21 +0200
> From: Martin Maechler <maechler at stat.math.ethz.ch>
> Subject: Re: [R] error using daisy() in library(cluster). Bug?
> To: Javier Garcia - CEBAS <rn001 at cebas.csic.es>
> Cc: R-help at stat.math.ethz.ch
> Message-ID: <16667.37849.634789.455341 at gargle.gargle.HOWL>
> Content-Type: text/plain; charset=iso-8859-1
>
> [Reverted back to R-help, after private exchange]
>
> >>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
> >>>>>     on Thu, 12 Aug 2004 17:12:01 +0200 writes:
>
> >>>>> "javier" == javier garcia <- CEBAS
<rn001 at cebas.csic.es>>
> >>>>>     on Thu, 12 Aug 2004 16:28:27 +0200 writes:
>
>     javier> Martin; Yes I know that there are variables with
all
>     javier> five values 'NA'. I've left them as they are just
>     javier> because of saving a couple of lines in the script,
>     javier> and because I like to see that they are there,
>     javier> although all values are 'NA'.  I don't expect they
>     javier> are used in the analysis, but are they the source
of
>     javier> the problem?
>
>     MM> yes, but only because of "stand = TRUE".
>
>     MM> Yes, one could imagine that it might be good when
>     MM> standardizing these "all NA variables" would work
>
>     MM> I'll think a bit more about it.  Thank you for the
>     MM> example.
>
> Ok. I've thought (and looked at the R code) a bit longer.
> Also considered the fact (you mentioned) that this worked in R
1.8.0.
> Hence, I'm considering the current behavior a bug.
>
> Here is the patch (apply to cluster/R/daisy.q in the *source*
>  or at the appriopriate place in
<cluster_installed>/R/cluster ) :
>
> --- daisy.q 2004/06/25 16:17:47 1.17
> +++ daisy.q 2004/08/12 15:23:26
> @@ -78,8 +78,8 @@
>      if(all(type2 == "I")) {
>   if(stand) {
>              x <- scale(x, center = TRUE, scale = FALSE) #->
0-means
> -            sx <- colMeans(abs(x))
> -            if(any(sx == 0)) {
> +     sx <- colMeans(abs(x), na.rm = TRUE)# can still have
NA's
> +     if(0 %in% sx) {
>                  warning(sQuote("x"), " has constant columns
",
>                          pColl(which(sx == 0)), "; these are
standardized to 0")
>                  sx[sx == 0] <- 1
>
>
> Thank you for helping to find and fix this bug.
> Martin Maechler, ETH Zurich, Switzerland
>
>     javier> El Jue 12 Ago 2004 15:11, MM escribió:
>
>     >>> Javier, I could well read your .RData and try your
>     >>> script to produce the same error from daisy().
>     >>>
>     >>> Your dataframe is of dimension 5 x 180 and has many
>     >>> variables that have all five values 'NA' (see below).
>     >>>
>     >>> You can't expect to use these, do you?  Martin
>
>
>
> ------------------------------
>
> Message: 17
> Date: Thu, 12 Aug 2004 16:14:07 +0000 (UTC)
> From: Gabor Grothendieck <ggrothendieck at myway.com>
> Subject: Re: [R] RE: Giving a first good impression of R to
Social
> Scientists
> To: r-help at stat.math.ethz.ch
> Message-ID: <loom.20040812T175128-1 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Rau, Roland <Rau <at> demogr.mpg.de> writes:
>
> > Yes, I do know the R-Commander. But I did not want to give
them a
> > GUI but rather expose them to the command line after I
demonstrated that the
> > steep learning curve in the beginning is worth the effort
for the final
> > results.
>
> Note that Rcmdr displays all the underlying generated R code
that does
> the analysis as it runs so you are exposed to the command
line.  This
> might pique the interest of students wishing to learn more
while giving
> an easy-to-use and immediately useful environment for those
who just want
> to get results in the shortest most direction fashion.
>
>
>
> ------------------------------
>
> Message: 18
> Date: Thu, 12 Aug 2004 09:25:07 -0700
> From: Seth Falcon <sfalcon at fhcrc.org>
> Subject: Re: [R] Approaches to using RUnit
> To: r-help at stat.math.ethz.ch
> Message-ID: <20040812162505.GA23691 at queenbee.fhcrc.org>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Aug 10, 2004 at 04:53:49PM +0200, Klaus Juenemann
wrote:
> > If you don't organize your code into packages but source
individual R
> > files your approach to source the code at the beginning of a
test file
> > looks the right thing to do.
>
> Appears to be working pretty well for me too ;-)
>
> > We mainly use packages and the code we use to test packages
A and B,
> > say, looks like
>
> SNIP
>
> > We use the tests subdirectory of a package to store our
RUnit tests
> > even though this is not really according to R conventions.
>
> In an off list exchange with A.J. Rossini, we discussed an
alternative
> for using RUnit in a package.  The idea was to put the
runit_*.R files
> (containing test code) into somePackage/inst/runit/ and then
put a
> script, say dorunit.R inside somePackage/test/ that would
create the
> test suite's similar to the code you included in your mail.
The
> advantage of this would be that the unit tests would run using
R CMD
> check.
>
> In the next week or so I hope to package-ify some code and try
this out.
>
>
> + seth
>
>
>
> ------------------------------
>
> Message: 19
> Date: Thu, 12 Aug 2004 12:25:03 -0400
> From: "Liaw, Andy" <andy_liaw at merck.com>
> Subject: RE: [R] Giving a first good impression of R to Social
> Scientists
> To: "'Barry Rowlingson'" <B.Rowlingson at lancaster.ac.uk>,
> "'r-help at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch>
> Message-ID:
> <3A822319EB35174CA3714066D590DCD504AF8214 at usrymx25.merck.com>
> Content-Type: text/plain
>
> > From: Barry Rowlingson
> >
> > Thomas Lumley wrote:
> > > On Thu, 12 Aug 2004, Rau, Roland wrote:
> > >
> > >>That is why would like to ask the experts on this list if
> > anyone of you has
> > >>encountered a similar experience and what you could advise
> > to persuade
> > >>people quickly that it is worth learning a new software?
> > >
> >
> >   The usual way of teaching R seems to be bottom-up. Here's
> > the command
> > prompt, type some arithmetic, make some assignments, learn
about
> > function calls and arguments, write your own functions,
write
> > your own
> > packages.
> >
> >   Perhaps a top-down approach might help certain cases.
People using
> > point-n-click packages tend to use a limited range of
analyses. Write
> > some functions that do these analyses, or give them wrappers
so that
> > they get something like:
> >
> >   > myData = readDataFile("foo.dat")
> >     Read 4 variables: Z, Age, Sex, Disease
> >
> >   > analyseThis(myData, response="Z", covariate="Age")
> >
> >    Z = 0.36 * Age, Significance level = 0.932
> >
> >   or whatever. Really spoon feed the things they need to do.
Make it
> > really easy, foolproof.
>
> The problem is that the only `fool' that had been `proof'
against is the one
> that the developer(s) had imagined.  One cannot under-estimate
users'
> ability to out-fool the developers' imagination...
>
> Cheers,
> Andy
>
>
> >   Then show them what's behind the analyseThis() function.
> > How its not
> > even part of the R distribution. How easy you made it for a
> > beginner to
> > do a complex and novel analysis. Then maybe it'll "click"
for
> > them, and
> > they'll see how having a programming language behind their
statistics
> > functions lets them explore in ways not thought possible
with the
> > point-n-click paradigm. Perhaps they'll start editing
> > analyseThis() and
> > write analyseThat(), start thinking for themselves.
> >
> >   Or maybe they'll just stare at you blankly...
> >
> > Baz
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html