[R] the joy of spreadsheets (off-topic)
Ross Boylan
ross at biostat.ucsf.edu
Thu Apr 25 03:41:03 CEST 2013
On 4/17/2013 5:18 AM, Kevin Wright wrote:
> On Tue, Apr 16, 2013 at 4:33 PM, Jim Lemon <jim at bitwrit.com.au> wrote:
>
>> On 04/17/2013 03:25 AM, Sarah Goslee wrote:
> The final point does relate to Excel and any application that hides what is
>> going on to the casual observer. I will treasure this URL to give to anyone
>> who chastises my moaning when I have to perform some task in Excel. It is
>> not an error in the application (although these certainly exist) but a
>> salutory caution to those who think that if a reasonable looking number
>> appears in a cell, it must be the correct answer. I have found not one, but
>> two such errors in the simple calculation of a "birthday age" from the date
>> of birth and date of death.
>>
>> Jim
>>
> So there (maybe) was a bug in Excel. Maybe hidden from the "casual
> observer". And since Excel is not R, and we are R snobs, Excel is evil,
> right? But, wait. Is it easier for a "casual observer" to detect a flaw
> in the formula in Excel, or to find an incorrect array index in an R
> script?
If the person knows R, or can fake it, I think it is easier. You have
to hunt around an Excel spreadsheet to see what the formulae are,
and the cell references usually have no inherent meaning. Further, one
of the errors they made, not including all the data in a range, is very
easy to make in excel but would be very hard to make in R.
As others have noted, the problem was not a bug in Excel the program
(unless you consider the design a bug) but a bug induced by the use of
Excel.
I doubt the exclusion of the range was deliberate, although the other
errors seem to have been. However, it is likely that if the result had
not been to their liking the original authors would have rechecked their
work and discovered the problem. One of the "errors", equal weighting
of countries regardless of how many years they spent in a given state,
is arguably a judgement call. Selective exclusion and inclusion of data
is also a judgement call, but that strikes me as less defensible.
Someone wrote that the overall finding of a negative relation between
debt and growth is intact. First of all, the headline summary was that
if debt/GDP > 90% you fall off a cliff. That is not intact; it is
false. The remaining relation is quite weak. And the substantive
conclusion that high debt *causes* weaker growth is a complete reading
into a correlational finding. It is pretty hard to sort out causal
ordering, but some evidence suggests it is more the reverse:
http://krugman.blogs.nytimes.com/2013/04/18/correlation-causality-and-casuistry/.
See Krugman and Delongs blogs generally for gleeful commentary, or the
original critique in
http://www.peri.umass.edu/236/hash/31e2ff374b6377b2ddec04deaa6388b1/publication/566/.
At any rate, a policy-relevant conclusion would need to be based on a
much more careful analysis than was done, careful not only in the
mechanics but in using methods that at least attempted to sort out the
causal relations.
The irony is that the substantively most trivial mistake is also the
most clearly an error, while the more important issues are at least a
little less clear-cut.
Ross
More information about the R-help
mailing list