[R] Generating Patient Data

David Winsemius dwinsemius at comcast.net
Tue Jul 1 21:18:02 CEST 2014


On Jun 25, 2014, at 1:49 PM, David Winsemius wrote:

> 
> On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote:
> 
>> Hi David,
>> 
>> I was thinking something like this:
>> 
>> ID   Disease
>> 1     A
>> 2     B
>> 3     A
>> 1    C
>> 2    D
>> 5    A
>> 4    B
>> 3    D
>> 2    A
>> ..    ..
>> 
>> How can this be done?
> 
> do.call(rbind,  lapply( 1:20, function(pt) { 
>        data.frame( patient=pt, 
>                    disease= sample( c('A','B','C','D','E','F'), pmin(2+rpois(1, 2), 6))  )}) )

If you were doing this repeatedly I suppose you might get time efficiency by  the rpois vector as a single item of the same length as your PatientID's 
> 
> -- 
> David.
>> 
>> 
>> On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>> On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:
>> 
>>> Dear R helpers,
>>> 
>>> I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
>>> having suffered from various diseases in the past (say diseases
>>> A,B,C,D,E,F). The only condition imposed is that each patient should've
>>> suffered from *atleast* two diseases. So my data frame will have two
>>> columns 'ID' and 'Disease'.
>>> 
>>> I want to do a basket analysis with this data, where ID will be the
>>> identifier and we will establish rules based on the 'Disease' column.
>>> 
>>> How can I generate this type of data in R?
>>> 
>> 
>> Perhaps something along these lines for 20 cases:
>> 
>>> data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) )
>> + )
>>   patient     disease
>> 1        1         F+D
>> 2        2     F+A+D+E
>> 3        3     F+D+C+E
>> 4        4     B+D+C+A
>> 5        5     D+A+F+C
>> 6        6       E+A+D
>> 7        7 E+F+B+C+A+D
>> 8        8   A+B+C+D+E
>> 9        9     B+E+C+F
>> 10      10         C+A
>> 11      11 B+A+D+E+C+F
>> 12      12         B+C
>> 13      13     A+D+B+E
>> 14      14 D+C+E+F+B+A
>> 15      15   C+F+D+E+A
>> 16      16       A+C+B
>> 17      17     C+D+B+E
>> 18      18         A+B
>> 19      19   C+B+D+E+F
>> 20      20       D+C+F
>> 
>>> --
>>> Regards
>>> Abhinaba Roy
>>> 
>>>      [[alternative HTML version deleted]]
>> 
>> You should read the Posting Guide and learn to post in HTML.
>>> 
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> --
>> David Winsemius
>> Alameda, CA, USA
>> 
>> 
>> 
>> 
>> -- 
>> Regards
>> Abhinaba Roy
>> Statistician
>> Radix Analytics Pvt. Ltd
>> Ahmedabad
>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list