[R] [r] How to pick colums from a ragged array?

Wed Oct 24 20:42:22 CEST 2012

Hello,

I just realized that function getRepLogical marks the second, not the 
first (eventually from last) to be removed. The first tapply should be

     dte <- tapply(x[,2], x[,1], FUN = function(x) duplicated(fun(x, 2), 
fromLast = TRUE))

in order to remove the first (or last).

Rui Barradas
Em 24-10-2012 18:41, Rui Barradas escreveu:
> Hello,
>
> Using one of Arun's ideas, some post ago, this new function returns a 
> logical index into id.d of the rows that should be _removed_, hence 
> rm1 and rm2. I think
>
>
>
> getRepLogical <- function(x, first = TRUE){
>     fun <- if(first) head else tail
>     dte <- tapply(x[,2], x[,1], FUN = function(x) duplicated(fun(x, 2)))
>     len <- tapply(x[,2], x[,1], FUN = length)
>     lst <- lapply(seq_along(dte), function(i) c(dte[[i]], rep(FALSE, 
> if(len[[i]] > 2) len[[i]] - 2 else 0)))
>     lst <- if(first) lst else lapply(lst, rev)
>     i1 <- unlist(lst)
>     dg <- tapply(x[,3], x[,1], FUN = function(x) !duplicated(fun(x, 2)))
>     lst <- lapply(seq_along(dte), function(i) c(dg[[i]], rep(FALSE, 
> if(len[[i]] > 2) len[[i]] - 2 else 0)))
>     lst <- if(first) lst else lapply(lst, rev)
>     i2 <- unlist(lst)
>     i1 & i2
> }
>
> rm1 <- getRepLogical(id.d)
> rm2 <- getRepLogical(id.d, first = FALSE)
>
> id.d[rm1, ]
> id.d[rm2, ]
>
> id.d$INCLUDE <- !(rm1 | rm2)
>
>
> Hope this helps,
>
> Rui Barradas
> Em 24-10-2012 16:41, Stuart Leask escreveu:
>> (And, considering  the real application, the functions ideally should 
>> probably output a variable INCLUDE, the same length as the original 
>> data, with TRUE and FALSE for whether or not that row should be 
>> included...)
>>
>> -----Original Message-----
>> From: Leask Stuart
>> Sent: 24 October 2012 16:25
>> To: arun (smartpink111 at yahoo.com); 'PIKAL Petr'; Rui Barradas 
>> (ruipbarradas at sapo.pt)
>> Subject: RE: [r] How to pick colums from a ragged array?
>>
>> Arun, Petr, Rui, many thanks for your help, and the functions you 
>> have written.
>>
>> You'll recall I wanted to remove these first (or last) duplicates, 
>> because they represented instances where two different diagnoses (in 
>> this case, variable DG, value 1, 2, 3, 4 or 5) had been recorded on 
>> the same day - so I can't say which was 'first' (or 'last').
>>
>> Your functions have revealed something I wasn't expecting: In some 
>> cases, the diagnoses recorded on the duplicated DATEs are the same!
>> This is a surprise to me, but probably reflects someone going to two 
>> different departments in a clinic, and both departments submit data. 
>> I have to say that crazy things like this are often a feature of real 
>> data, which I'm sure you've come across yourselves.
>>
>> Of course, I don't want to remove records in which I can determine an 
>> unambiguous 'first diagnosis'.
>>
>> You have all put in so much effort on my behalf, I'm ashamed to ask, 
>> but I wonder if any of the functions you've written could do this 
>> with a little more
>> Indexing and the 'duplicate' function
>> So the function should only exclude an ID, having identified a first 
>> (or last) DATE duplicate, the DGs for these two dates are different.
>>
>> Test dataset:
>>
>> ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
>> ,547,794,814,814,814,814,814,814,841,841,841,841,841
>> ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
>> ,1019)
>>
>> DATE <-
>> c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
>> ,20060111,20071119,20080107,20080407,20080521,20080521,20041005
>> ,20070905,20020814,20021125,20040429,20040429,20071205,20071205
>> ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
>>   ,20070112,20070514, 19870508,20040205,20040205, 20080521,20080521
>>   ,20091224,20050503,19870508,19870508,19880330)
>>
>> DG<-
>> c(1,2,1,1,4,4,3,2,3,2,1,2,3,2,1,2,2,2,2,2,2,1,2,1,1,1,1,1,1,4,3,3,3,4,3,2,2,2,1,1) 
>>
>>
>> id.d<-data.frame(ID,DATE,DG)
>> id.d
>>
>> # Considering Ruis  getRepeat function:
>>
>> g.r<-getRepeat(id.d)    # defaults to first = TRUE getRepeat(id.d, 
>> first = FALSE)  to get the last ones
>> g.rr<-do.call(rbind, g.r) # put the data into a matrix
>>
>> # I can remove the date duplicates with:
>> g.rr[rep(!duplicated(g.rr)[(1:(dim(g.rr)[1]/2))*2],each=2),]
>>
>> I'm not sure how to add this to your suggestions, Arun & Petr...
>>
>>
>> Stuart
>>
>>
>> -----Original Message-----
>> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
>> Sent: 23 October 2012 15:24
>> To: Stuart Leask
>> Subject: RE: [r] How to pick colums from a ragged array?
>>
>> Hi
>>
>> I assumed that id.d is data frame
>>
>> id.d <- data.frame (ID,DATE )
>>
>> and
>>
>> fff(id.d)
>>
>> works for me
>>
>> Petr
>>
>>
>>> -----Original Message-----
>>> From: Stuart Leask [mailto:Stuart.Leask at nottingham.ac.uk]
>>> Sent: Tuesday, October 23, 2012 3:13 PM
>>> To: PIKAL Petr
>>> Subject: RE: [r] How to pick colums from a ragged array?
>>>
>>> Hi Petr.
>>> I see what you mean it should do, but when I run it I get an error
>>> (see below).
>>> Stuart
>>>
>>>
>>>> ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
>>> + ,547,794,814,814,814,814,814,814,841,841,841,841,841
>>> + ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
>>> + ,1019)
>>>> DATE <-
>>> + c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
>>> + ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
>>> + ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
>>> + ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
>>> +  ,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
>>> +  ,20091224,20050503,19870508,19870508,19880330)
>>>>   id.d <- cbind (ID,DATE )
>>>> fff<-function(data, first=TRUE, remove=FALSE) {
>>> +
>>> + testfirst <- function(x) x[1,2]==x[2,2] testlast <- function(x)
>>> + x[nrow(x),2]==x[nrow(x)-1,2]
>>> +
>>> + if(first) sel <- as.numeric(names(which(unlist(sapply(split(data,
>>> + data[,1]), testfirst))))) else sel <-
>>> + as.numeric(names(which(unlist(sapply(split(data, data[,1]),
>>> + testlast)))))
>>> +
>>> + if (remove) data[!data[,1] %in% sel,] else data[data[,1] %in% sel,]
>>> + }
>>>> fff(id.d)
>>> Error in x[1, 2] : incorrect number of dimensions
>>> -----Original Message-----
>>> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
>>> Sent: 23 October 2012 13:51
>>> To: Stuart Leask; r-help at r-project.org
>>> Subject: RE: [r] How to pick colums from a ragged array?
>>>
>>> Hi
>>>
>>>> -----Original Message-----
>>>> From: Stuart Leask [mailto:Stuart.Leask at nottingham.ac.uk]
>>>> Sent: Tuesday, October 23, 2012 2:29 PM
>>>> To: PIKAL Petr; r-help at r-project.org
>>>> Subject: RE: [r] How to pick colums from a ragged array?
>>>>
>>>> Hi there.
>>>>
>>>> Not sure I follow what you are doing.
>>>>
>>>> I want a list of all the IDs that have duplicate DATE entries, only
>>>> when the DATE is the earliest (or last) date for that ID.
>>> And that is what the function (with 3 small modifications) does
>>>
>>>
>>> fff<-function(data, first=TRUE, remove=FALSE) {
>>>
>>> testfirst <- function(x) x[1,2]==x[2,2] testlast <- function(x)
>>> x[nrow(x),2]==x[nrow(x)-1,2]
>>>
>>> if(first) sel <- as.numeric(names(which(unlist(sapply(split(data,
>>> data[,1]), testfirst))))) else sel <-
>>> as.numeric(names(which(unlist(sapply(split(data, data[,1]),
>>> testlast)))))
>>>
>>> if (remove) data[!data[,1] %in% sel,] else data[data[,1] %in% sel,] }
>>>
>>> See the result of your refined data
>>>
>>> fff(id.d)
>>>       ID       DATE
>>> 5   167 2004-02-05
>>> 6   167 2004-02-05
>>> 22  841 2005-04-21
>>> 23  841 2005-04-21
>>> 24  841 2006-04-28
>>> 25  841 2006-06-02
>>> 26  841 2006-08-16
>>> 27  841 2006-10-25
>>> 28  841 2006-11-29
>>> 29  841 2007-01-12
>>> 30  841 2007-05-14
>>> 38 1019 1987-05-08
>>> 39 1019 1987-05-08
>>> 40 1019 1988-03-30
>>>> fff(id.d, first=F)
>>>     ID       DATE
>>> 5 167 2004-02-05
>>> 6 167 2004-02-05
>>>> fff(id.d, remove=T)
>>>      ID       DATE
>>> 1   58 2006-08-21
>>> 2   58 2006-12-07
>>> 3   58 2008-01-02
>>> 4   58 2009-09-04
>>> 7  323 2005-11-11
>>> 8  323 2006-01-11
>>> 9  323 2007-11-19
>>> 10 323 2008-01-07
>>> 11 323 2008-04-07
>>> 12 323 2008-05-21
>>> 13 323 2008-07-11
>>> 14 547 2004-10-05
>>> 15 794 2007-09-05
>>> 16 814 2002-08-14
>>> 17 814 2002-11-25
>>> 18 814 2004-04-29
>>> 19 814 2004-04-29
>>> 20 814 2007-12-05
>>> 21 814 2008-02-27
>>> 31 910 1987-05-08
>>> 32 910 2004-02-05
>>> 33 910 2004-02-05
>>> 34 910 2009-11-20
>>> 35 910 2009-12-10
>>> 36 910 2009-12-24
>>> 37 999 2005-05-03
>>> You can do surgery on fff function to see what result comes from some
>>> piece of the function e.g.
>>>
>>> sapply(split(id.d, id.d[,1]), testlast)
>>>
>>> Regards
>>> Petr
>>>
>>>> I have refined my test dataset, to include some tests (e.g. 910 has
>>>> the same dup as 1019, but for 910 it's not the earliest date):
>>>>
>>>>
>>>> ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
>>>> ,547,794,814,814,814,814,814,814,841,841,841,841,841
>>>> ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
>>>> ,1019)
>>>>
>>>> DATE <-
>>>> c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
>>>> ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
>>>> ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
>>>> ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
>>>>   ,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
>>>>   ,20091224,20050503,19870508,19870508,19880330)
>>>>
>>>> Correct output:
>>>> "167"  "841"  "1019"
>>>>
>>>> Stuart
>>>>
>>>> -----Original Message-----
>>>> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
>>>> Sent: 23 October 2012 13:15
>>>> To: Stuart Leask; r-help at r-project.org
>>>> Subject: RE: [r] How to pick colums from a ragged array?
>>>>
>>>> Hi
>>>>
>>>> Rui's answer brought me to more elaborated solution which still
>>>> needs data frame to be ordered by date
>>>>
>>>> fff<-function(data, first=TRUE, remove=FALSE) {
>>>>
>>>> testfirst <- function(x) x[1,2]==x[2,2] testlast <- function(x)
>>>> x[length(x),2]==x[length(x)-1,2]
>>>>
>>>> if(first) sel <- as.numeric(names(which(sapply(split(data,
>>>> data[,1]),
>>>> testfirst)))) else sel <- as.numeric(names(which(sapply(split(data,
>>>> data[,1]), testlast))))
>>>>
>>>> if (remove) data[data[,1]!=sel,] else data[data[,1]==sel,] }
>>>>
>>>>
>>>>> fff(id.d)
>>>>      ID     DATE
>>>> 31 910 20091105
>>>> 32 910 20091105
>>>> 33 910 20091117
>>>> 34 910 20091119
>>>> 35 910 20091120
>>>> 36 910 20091210
>>>> 37 910 20091224
>>>> 38 910 20091224
>>>>
>>>>> fff(id.d, remove=T)
>>>>       ID     DATE
>>>> 1    58 20060821
>>>> 2    58 20061207
>>>> 3    58 20080102
>>>> 4    58 20090904
>>>> 5   167 20040205
>>>> 6   167 20040323
>>>> 7   323 20051111
>>>> 8   323 20060111
>>>> 9   323 20071119
>>>> 10  323 20080107
>>>> 11  323 20080407
>>>> 12  323 20080521
>>>> 13  323 20080711
>>>> 14  547 20041005
>>>> 15  794 20070905
>>>> 16  814 20020814
>>>> 17  814 20021125
>>>> 18  814 20040429
>>>> 19  814 20040429
>>>> 20  814 20071205
>>>> 21  814 20080227
>>>> 22  841 20050421
>>>> 23  841 20060130
>>>> 24  841 20060428
>>>> 25  841 20060602
>>>> 26  841 20060816
>>>> 27  841 20061025
>>>> 28  841 20061129
>>>> 29  841 20070112
>>>> 30  841 20070514
>>>> 39  999 20050503
>>>> 40 1019 19870508
>>>> 41 1019 19880223
>>>> 42 1019 19880330
>>>> 43 1019 19880330
>>>> Regards
>>>> Petr
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>> project.org] On Behalf Of PIKAL Petr
>>>>> Sent: Tuesday, October 23, 2012 1:49 PM
>>>>> To: Stuart Leask; r-help at r-project.org
>>>>> Subject: Re: [R] [r] How to pick colums from a ragged array?
>>>>>
>>>>> Hi
>>>>>
>>>>> I did not check your code and rather followed your explanation.
>>> BTW,
>>>>> thanks for test data.
>>>>>
>>>>> small change in data frame to make DATE as Date class
>>>>>
>>>>> datum<-as.Date(as.character(DATE), format="%Y%m%d") id.d <-
>>>>> data.frame(ID,datum )
>>>>>
>>>>> ordering by date
>>>>>
>>>>> id.d<-id.d[order(id.d$datum),]
>>>>>
>>>>>
>>>>> two functions to test if first two dates are the same or last two
>>>>> dates are the same
>>>>>
>>>>> testfirst <- function(x) x[1,2]==x[2,2] testlast <- function(x)
>>>>> x[length(x),2]==x[length(x)-1,2]
>>>>>
>>>>> change one last date in the data frame to be the same as previous
>>>>>
>>>>> id.d[35,2]<-id.d[36,2]
>>>>>
>>>>> and here are results
>>>>>
>>>>> sapply(split(id.d, id.d$ID), testlast)
>>>>>     58   167   323   547   794   814   841   910   999 1019
>>>>> FALSE FALSE FALSE    NA    NA FALSE FALSE  TRUE    NA FALSE
>>>>>
>>>>>> sapply(split(id.d, id.d$ID), testfirst)
>>>>>     58   167   323   547   794   814   841   910   999 1019
>>>>> FALSE FALSE FALSE    NA    NA FALSE FALSE FALSE    NA FALSE
>>>>>
>>>>> Now you can select ID which is true and remove it from your data
>>>>> which(sapply(split(id.d, id.d$ID), testlast))
>>>>>
>>>>> and use it for your data frame to subset/remove id.d$ID ==
>>>>> as.numeric(names(which(sapply(split(id.d, id.d$ID), testlast))))
>>> [1]
>>>>> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>>>> FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>>> FALSE
>>>>> FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>>> FALSE
>>>>> FALSE TRUE  TRUE [37]  TRUE  TRUE TRUE  TRUE
>>>>>
>>>>> However I am not sure if this is exactly what you want.
>>>>>
>>>>> Regards
>>>>> Petr
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>>> project.org] On Behalf Of Stuart Leask
>>>>>> Sent: Tuesday, October 23, 2012 11:38 AM
>>>>>> To: r-help at r-project.org
>>>>>> Subject: [R] [r] How to pick colums from a ragged array?
>>>>>>
>>>>>> I have a large dataset (~1 million rows) of three variables: ID
>>>>>> (patient's name), DATE (of appointment) and DIAGNOSIS (given on
>>>> that
>>>>>> date).
>>>>>> Patients may have been assigned more than one diagnosis at any
>>> one
>>>>>> appointment - leading to two rows, same ID and DATE but
>>>>>> different DIAGNOSIS.
>>>>>> The diagnoses may change between appointments.
>>>>>>
>>>>>> I want to subset the data in two ways:
>>>>>>
>>>>>> -          define groups of patients by the first diagnosis given
>>>>>>
>>>>>> -          define groups of patients by the last diagnosis given.
>>>>>>
>>>>>> The problem:
>>>>>> Unfortunately, a small number of patients have been given more
>>>>>> than one diagnosis at their first (or last) appointment. These
>>>>>> individuals I need to identify and remove, as it's not possible
>>> to
>>>>>> say uniquely what their first (or last) diagnosis was. So I need
>>>>>> to identify and remove these individuals which have pairs of
>>>>>> rows with the same ID
>>>>> and
>>>>>> (lowest or highest) DATE. The size of the dataset precludes the
>>>>> option
>>>>>> of doing this by eye.
>>>>>>
>>>>>> I suspect there is a very elegant way of doing this in R.
>>>>>>
>>>>>> This is what I've come up with:
>>>>>>
>>>>>>
>>>>>> -          Sort by DATE then ID
>>>>>>
>>>>>> -          Make a ragged array of DATE by ID
>>>>>>
>>>>>> -          Remove IDs that only occur once.
>>>>>>
>>>>>> -          Subtract the first and second DATEs. Remove IDs for
>>>> which
>>>>>> this = zero, as this will only be true for IDs for which the
>>>>>> appointment is recorded twice (because there were two diagnoses
>>>>>> recorded on this date).
>>>>>>
>>>>>> -          (Then do the same to get the 'last appointment'
>>>>> duplicates,
>>>>>> by reversing the initial sort by DATE.)
>>>>>>
>>>>>> I am stuck at the 'Subtract dates' step: I would like to get the
>>>>>> data out of the ragged array by columns (so e.g. I end up with a
>>>>>> matrix of ID, 1st DATE, 2nd DATE). But I can't get the dates out
>>>>>> by column from the ragged array.
>>>>>>
>>>>>> I hope someone can help. My ugly code is below, with some data
>>> for
>>>>>> testing.
>>>>>>
>>>>>>
>>>>>> Stuart
>>>>>>
>>>>>>
>>>>>> Dr Stuart John Leask DM FRCPsych MB BChir MA Clinical Senior
>>>>>> Lecturer and Honorary Consultant Pychiatrist Institute of Mental
>>>>>> Health, Innovation Park Triumph Road, Nottingham, Notts. NG7 2TU.
>>>> UK
>>>>>> Tel. +44
>>>>>> 115 82 30419
>>>>>>
>>> stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk
>>>>>> Google 'Dr Stuart Leask'
>>>>>>
>>>>>>
>>>>>> ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
>>>>>> ,547,794,814,814,814,814,814,814,841,841,841,841,841
>>>>>> ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
>>>>>> ,1019)
>>>>>>
>>>>>> DATE <-
>>>>>> c(20060821,20061207,20080102,20090904,20040205,20040323,20051111
>>>>>> ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
>>>>>> ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
>>>>>> ,20050421,20060130,20060428,20060602,20060816,20061025,20061129
>>>>>> ,20070112,20070514,20091105,20091117,20091119,20091120,20091210
>>>>>> ,20091224,20050503,19870508,19880223,19880330)
>>>>>>
>>>>>> id.d <- cbind (ID,DATE )
>>>>>> rag.a  <-  split ( id.d [ ,2 ], id.d [ ,1])               #
>>> create
>>>>>> ragged array, 1-n DATES for every NAME
>>>>>>
>>>>>> # Inelegant attempt to remove IDs that only have one entry:
>>>>>>
>>>>>> rag.s <-tapply  (id.d [ ,2], id.d [ ,1], sum)             #add up
>>>> the
>>>>>> dates per row
>>>>>> # Since DATE is in 'year mo da', if there's only one date, sum
>>>>>> will
>>>>> be
>>>>>> less than 2100000:
>>>>>> rag.t <- rag.s [ rag.s > 21000000 ]
>>>>>> multi.dates <- rownames ( rag.t )                         # all
>>> the
>>>>> IDs
>>>>>> with >1 date
>>>>>> rag.am <- rag.a [ multi.dates ]                           #
>>> rag.am
>>>>> only
>>>>>> has IDs with > 1 Date
>>>>>>
>>>>>>
>>>>>> # But now I'm stuck.
>>>>>> # Each row of the array is rag.am$ID.
>>>>>> # So I can't pick columns of DATEs from the ragged array.
>>>>>>
>>>>>> This message and any attachment are intended solely for the
>>>>>> addressee and may contain confidential information. If you have
>>>>>> received this message in error, please send it back to me, and
>>>>>> immediately delete
>>>>> it.
>>>>>> Please do not use, copy or disclose the information contained in
>>>>>> this message or in any attachment.  Any views or opinions
>>>>>> expressed by the author of this email do not necessarily reflect
>>>>>> the views of the University of Nottingham.
>>>>>>
>>>>>> This message has been checked for viruses but the contents of an
>>>>>> attachment may still contain software viruses which could damage
>>>>>> your computer system:
>>>>>> you are advised to perform your own checks. Email communications
>>>>>> with the University of Nottingham may be monitored as permitted
>>> by
>>>>>> UK legislation.
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-
>>> project.org/posting-
>>>>>> guide.html and provide commented, minimal, self-contained,
>>>>>> reproducible code.
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>>> guide.html and provide commented, minimal, self-contained,
>>>>> reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.