[R] the joy of spreadsheets (off-topic)

Ross Boylan ross at biostat.ucsf.edu
Thu Apr 25 03:41:03 CEST 2013


On 4/17/2013 5:18 AM, Kevin Wright wrote:
> On Tue, Apr 16, 2013 at 4:33 PM, Jim Lemon <jim at bitwrit.com.au> wrote:
>
>> On 04/17/2013 03:25 AM, Sarah Goslee wrote:
> The final point does relate to Excel and any application that hides what is
>> going on to the casual observer. I will treasure this URL to give to anyone
>> who chastises my moaning when I have to perform some task in Excel. It is
>> not an error in the application (although these certainly exist) but a
>> salutory caution to those who think that if a reasonable looking number
>> appears in a cell, it must be the correct answer. I have found not one, but
>> two such errors in the simple calculation of a "birthday age" from the date
>> of birth and date of death.
>>
>> Jim
>>
> So there (maybe) was a bug in Excel.  Maybe hidden from the "casual
> observer".  And since Excel is not R, and we are R snobs, Excel is evil,
> right?  But, wait.  Is it easier for a "casual observer" to detect a flaw
> in the formula in Excel, or to find an incorrect array index in an R
> script?
If the person knows R, or can fake it, I think it is easier.  You have 
to hunt around an Excel spreadsheet to see what the formulae are,
and the cell references usually have no inherent meaning.  Further, one 
of the errors they made, not including all the data in a range, is very 
easy to make in excel but would be very hard to make in R.

As others have noted, the problem was not a bug in Excel the program 
(unless you consider the design a bug) but a bug induced by the use of 
Excel.

I doubt the exclusion of the range was deliberate, although the other 
errors seem to have been.  However, it is likely that if the result had 
not been to their liking the original authors would have rechecked their 
work and discovered the problem.  One of the "errors", equal weighting 
of countries regardless of how many years they spent in a given state, 
is arguably a judgement call. Selective exclusion and inclusion of data 
is also a judgement call, but that strikes me as less defensible.

Someone wrote that the overall finding of a negative relation between 
debt and growth is intact.  First of all, the headline summary was that 
if debt/GDP > 90% you fall off a cliff.  That is not intact; it is 
false.  The remaining relation is quite weak.  And the substantive 
conclusion that high debt *causes* weaker growth is a complete reading 
into a correlational finding.  It is pretty hard to sort out causal 
ordering, but some evidence suggests it is more the reverse: 
http://krugman.blogs.nytimes.com/2013/04/18/correlation-causality-and-casuistry/. 
See Krugman and Delongs blogs generally for gleeful commentary, or the 
original critique in 
http://www.peri.umass.edu/236/hash/31e2ff374b6377b2ddec04deaa6388b1/publication/566/.

At any rate, a policy-relevant conclusion would need to be based on a 
much more careful analysis than was done, careful not only in the 
mechanics but in using methods that at least attempted to sort out the 
causal relations.

The irony is that the substantively most trivial mistake is also the 
most clearly an error, while the more important issues are at least a 
little less clear-cut.

Ross



More information about the R-help mailing list