[R] Problems with Excel (was SAS or R software)

Marc Schwartz MSchwartz at MedAnalytics.com
Wed Dec 22 00:22:11 CET 2004


On Tue, 2004-12-21 at 16:34 -0400, Kjetil Brinchmann Halvorsen wrote:
> Patrick Burns wrote:
> 
> > I'm in the process of researching problems with Excel.  The references
> > given by Tim and Marc seem to lead to discussions of most of the problems
> > with statistical procedures in Excel.  The executive summary is that if
> > it is in Excel and it looks like statistics, then avoid it.
> >
> > However, there are some other issues as well.  Here are three:
> >
> > 1)  For those of us used to S, Excel gives a non-intuitive result for:
> >
> >           -2^4
> 
> Yes. However, OO Calc gives the same result as Excel (Office 2003). Are 
> they trying to
> heavily to mimic Excel?

Yes. As I mentioned in the thread on Gnumeric over on r-devel, OO.org
seems to focus more on ensuring behavior consistent with Excel to a
fault rather than improving upon it. This includes the so-called "close
to zero adjustments" that I referred to previously in this thread.

This is my opinion alone, but presumably this is driven by the need to
be able to be a "drop in" replacement for Excel. So, as long as under-
informed Excel users are not aware of these issues, they could happily
use Calc without experiencing any differences in behavior and not be
confused (or burned) by unexpected results.

What is interesting, is that if you try to use that same formula in
Gnumeric, it will calculate 16, but if you go back and review the
formula, it will automatically have parens around the "-2" so that the
formula looks like "=(-2)^4" in the cell, instead of the R/S like
negation of the formula (ie. "-(2^4)"). You can explicitly put that
formula into a cell and get -16 in Gnumeric.

> > I wonder how many formulas are in existence that have unintended results
> > due to this.
> >
> > 2)  When numbers in scientific notation are written to ascii (csv or 
> > txt), Excel
> > decides that some of the digits are unnecessary and doesn't write them 
> > to the
> > file.  As far as I can tell the number of significant digits that you 
> > get is arbitrary
> > and capricious.  Apparently Microsoft thinks of this as a feature -- 
> > I'm not sure
> > what they think the up-side is.  If you want all of your digits, make 
> > sure that none
> > of the numbers are displayed in scientific notation.  (Yes, what is 
> > written depends
> > on what is displayed, but it is not WYSIWYG.)
> 
> Tried this to with OO Calc. All were written with 14 significant digits. 
> However, test was very
> fast and needs replcation investing some more time.
> 
> >
> > 3)  In the olden days if there were a blank cell in a range of cells 
> > on which a
> > function operated (a mean perhaps), then Excel decided that the blank 
> > meant
> > zero.  Microsoft changed this so that blank cells are ignored 
> > (apparently a good
> > thing).  However, if you reference a blank cell, it is then counted as 
> > zero.
> >
> > Exercise:  In the first column put numbers in the first few rows, but 
> > leave one
> > cell blank.  below that do the average of the cells (including the 
> > blank).  Now
> > in cell B1 put "=A1"  and copy this formula down column B.  You will 
> > get two
> > different means.
> >
> Yes. Done, but also on OP Calc with *exactly* the same result. 

Same with Gnumeric here unfortunately. It would appear that the behavior
is the result of pasting the blank cell as a "0" instead of as a blank.
Thus, the average() function sees the "0" as a legit value and of course
increases the sample size by 1.

I just filed a bug with the Gnumeric folks on this.

> Does one need to investigate OO Calc in the same way as has been done
> with Excel?

Yes, absolutely.


Marc




More information about the R-help mailing list