[R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Mar 25 00:09:53 CET 2009


Rolf Turner wrote:
> 
> On 25/03/2009, at 10:04 AM, Frank E Harrell Jr wrote:
> 
>> Ravi Varadhan wrote:
>>> Hi,
>>>
>>> I am looking for a data set containing the information from a 
>>> randomized trial evaluating the effect of DES (diethylsilbestrol) on 
>>> multiple time-to-event endpoints, prostate cancer, CVD, and other 
>>> causes.  The original source of this data is Green and Byar (1980).  
>>> This is a popular competing risks problem that has subsequently been 
>>> discussed in a number of statistical papers including Kay (1986).
>>>
>>> Does anyone have a digital version of this data set?
>>>
>>> This data is also presented in Andrews, D. F. and Herzberg, A. M. 
>>> (1985). Data.   Does a digital version of all the data sets in A & H 
>>> exist?
>>>
>>> Thanks very much,
>>> Ravi.
>>
>> An R binary dataset is at http://biostat.mc.vanderbilt.edu/Datasets
>>
>> Note that there is something strange about the AP variable with a lot of
>> ties at some value near 1.0.  I have never been able to find any
>> documentation about this problem.  If you find any please let me know.
> 
> Out of idle curiosity I went to have a look at this data set.
> 
> I had problems.
> 
> (1) The given URL didn't work for me; when I clicked on it, I got an 
> error 404.
> But if I went to http://biostat.mc.vanderbilt.edu I found a link to 
> ``Datasets'',
> and clicking on that got me to some data sets.

Sorry that should have been DataSets not Datasets.

> 
> (2) Scrolling down to ``Byar and Green prostate cancer data'' appeared 
> to get
> me to the right place.  But I couldn't see any signs of any ``R binary 
> files''.

Please look again.  It's under the heading "R".  Unfortunately I used 
.sav suffix for save() files in the old days.

The .xls fine opened with no problem in OpenOffice; has 506 rows.

Frank


> 
> The available formats appear to be *.sav (SPSS?), *.sdd (???), and *.xls.
> 
> (3) I downloaded the prostate.xls file O.K.  But when I tried to read it 
> in with
> the read.xls() function from the gdata package, I got an error to the 
> effect
> 
>  > X <- read.xls("prostate.xls")
> Converting xls file to csv file... Done.
> Reading csv file... Error in read.table(file = file, header = header, 
> sep = sep, quote = quote,  :
>   no lines available in input
> 
> I was able to ``open'' the prostate.xls file with the version of Excel 
> available
> on my Mac, save it as a *.csv file, and then read *that* in with read.csv()
> 
> What am I missing?  *Are* there ``R binary'' files lurking about that I 
> am somehow
> not seeing?  Why won't read.xls() work on this data set?
> 
>     cheers,
> 
>         Rolf Turner
> 
> ######################################################################
> Attention:This e-mail message is privileged and confidential. If you are 
> not theintended recipient please delete the message and notify the 
> sender.Any views or opinions presented are solely those of the author.
> 
> This e-mail has been scanned and cleared by 
> MailMarshalwww.marshalsoftware.com
> ######################################################################
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list