[R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data
David Winsemius
dwinsemius at comcast.net
Wed Mar 25 12:17:57 CET 2009
One further version: this one with a header and with NA's replacing
the -9999's that apparently has not deleted any cases with missing data:
http://www.stat.auckland.ac.nz/~wild/764/s764data/prostatic.tab
--
David Winsemius
On Mar 24, 2009, at 11:51 PM, Ravi Varadhan wrote:
> Fine detective work, David. Now, you can see the reasons for my
> frustration - multiplicity of data sets combined with non-existent
> documentation of the source of data in journal articles (e.g. Kay
> 1986; Lunn and McNeil 1995).
>
> Best,
> Ravi.
>
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
>
>
>> On Mar 24, 2009, at 8:57 PM, Rolf Turner wrote:
>>
>>>
>>> On 25/03/2009, at 12:09 PM, Frank E Harrell Jr wrote:
>>>
>>> <snip>
>>>
>>>>> (2) Scrolling down to ``Byar and Green prostate cancer data''
>>>>> appeared
>>>>> to get
>>>>> me to the right place. But I couldn't see any signs of any ``R
>>
>>>>> binary
>>>>> files''.
>>>>
>>>> Please look again. It's under the heading "R". Unfortunately I
>>>> used
>>>> .sav suffix for save() files in the old days.
>>>
>>> Ah-ha. Oh me of little faith. I have been hanging around (in
>>> my current work environment) with too many SPSS users, and the
>>> *.sav extension seems to be the standard for SPSS data files.
>>> Whence my corrupted thinking.
>>>
>>>> The .xls fine opened with no problem in OpenOffice; has 506 rows.
>>>
>>> Hmmm. When I opened it with Excel on the Mac I got a spread
>>> sheet with 503 rows --- the first row being the column names,
>>> so there were really 502 rows.
>>
>> The last "patnr" is "506" but there are only 502 lines of data. 471,
>>
>> 473, 475 and 488 are missing.
>>
>> And the CMU Statlib version for 2002 looks the same.
>>
>>
>> The version at this site is missing more than 25 cases:
>>
>>
>> Here are two other copies of the dataset the first of which appears
>> to
>> have those missing cases:
>> This one has patient numbers:
>>
>>
>> This one has a description of the fields and cites the one above but
>>
>> has not retained the patient numbers and has apparently only kept the
>>
>> 475 cases with complete data.
>>
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list