[R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data
Ravi Varadhan
rvaradhan at jhmi.edu
Wed Mar 25 04:51:51 CET 2009
Fine detective work, David. Now, you can see the reasons for my frustration - multiplicity of data sets combined with non-existent documentation of the source of data in journal articles (e.g. Kay 1986; Lunn and McNeil 1995).
Best,
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
Date: Tuesday, March 24, 2009 10:54 pm
Subject: Re: [R] Green and Byar (1980) Prostate Cancer Data set from Andrews and Herzberg - Data
To: Rolf Turner <r.turner at auckland.ac.nz>
Cc: R-help Forum <r-help at r-project.org>, Ravi Varadhan <rvaradhan at jhmi.edu>
> On Mar 24, 2009, at 8:57 PM, Rolf Turner wrote:
>
> >
> > On 25/03/2009, at 12:09 PM, Frank E Harrell Jr wrote:
> >
> > <snip>
> >
> >>> (2) Scrolling down to ``Byar and Green prostate cancer data''
> >>> appeared
> >>> to get
> >>> me to the right place. But I couldn't see any signs of any ``R
>
> >>> binary
> >>> files''.
> >>
> >> Please look again. It's under the heading "R". Unfortunately I used
> >> .sav suffix for save() files in the old days.
> >
> > Ah-ha. Oh me of little faith. I have been hanging around (in
> > my current work environment) with too many SPSS users, and the
> > *.sav extension seems to be the standard for SPSS data files.
> > Whence my corrupted thinking.
> >
> >> The .xls fine opened with no problem in OpenOffice; has 506 rows.
> >
> > Hmmm. When I opened it with Excel on the Mac I got a spread
> > sheet with 503 rows --- the first row being the column names,
> > so there were really 502 rows.
>
> The last "patnr" is "506" but there are only 502 lines of data. 471,
>
> 473, 475 and 488 are missing.
>
> And the CMU Statlib version for 2002 looks the same.
>
>
> The version at this site is missing more than 25 cases:
>
>
> Here are two other copies of the dataset the first of which appears
> to
> have those missing cases:
> This one has patient numbers:
>
>
> This one has a description of the fields and cites the one above but
>
> has not retained the patient numbers and has apparently only kept the
>
> 475 cases with complete data.
>
>
>
> >
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list