[R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data

David Winsemius dwinsemius at comcast.net
Wed Mar 25 12:17:57 CET 2009


One further version:  this one with a header and with NA's replacing  
the -9999's that apparently has not deleted any cases with missing data:
http://www.stat.auckland.ac.nz/~wild/764/s764data/prostatic.tab

-- 
David Winsemius
On Mar 24, 2009, at 11:51 PM, Ravi Varadhan wrote:

> Fine detective work, David.  Now, you can see the reasons for my  
> frustration - multiplicity of data sets combined with non-existent  
> documentation of the source of data in journal articles (e.g. Kay  
> 1986; Lunn and McNeil 1995).
>
> Best,
> Ravi.
>
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
>
>
>> On Mar 24, 2009, at 8:57 PM, Rolf Turner wrote:
>>
>>>
>>> On 25/03/2009, at 12:09 PM, Frank E Harrell Jr wrote:
>>>
>>> 	<snip>
>>>
>>>>> (2) Scrolling down to ``Byar and Green prostate cancer data''
>>>>> appeared
>>>>> to get
>>>>> me to the right place.  But I couldn't see any signs of any ``R
>>
>>>>> binary
>>>>> files''.
>>>>
>>>> Please look again.  It's under the heading "R".  Unfortunately I  
>>>> used
>>>> .sav suffix for save() files in the old days.
>>>
>>> 	Ah-ha.  Oh me of little faith.  I have been hanging around (in
>>> 	my current work environment) with too many SPSS users, and the
>>> 	*.sav extension seems to be the standard for SPSS data files.
>>> 	Whence my corrupted thinking.
>>>
>>>> The .xls fine opened with no problem in OpenOffice; has 506 rows.
>>>
>>> 	Hmmm.  When I opened it with Excel on the Mac I got a spread
>>> 	sheet with 503 rows --- the first row being the column names,
>>> 	so there were really 502 rows.
>>
>> The last "patnr" is "506" but there are only 502 lines of data. 471,
>>
>> 473, 475 and 488 are missing.
>>
>> And the CMU Statlib version for 2002 looks the same.
>>
>>
>> The version at this site is missing more than 25 cases:
>>
>>
>> Here are two other copies of the dataset the first of which appears
>> to
>> have those missing cases:
>> This one has patient numbers:
>>
>>
>> This one has a description of the fields and cites the one above but
>>
>> has not retained the patient numbers and has apparently only kept the
>>
>> 475 cases with complete data.
>>
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list