[R] How to remove rows based on frequency of factor and then difference date scores
David Winsemius
dwinsemius at comcast.net
Tue Aug 24 19:53:30 CEST 2010
On Aug 24, 2010, at 1:19 PM, Chris Beeley wrote:
> Hello-
>
> A basic question which has nonetheless floored me entirely. I have a
> dataset which looks like this:
>
> Type ID Date Value
> A 1 16/09/2020 8
> A 1 23/09/2010 9
> B 3 18/8/2010 7
> B 1 13/5/2010 6
>
> There are two Types, which correspond to different individuals in
> different conditions, and loads of ID labels (1:50) corresponding to
> the different individuals in each condition, and measurements at
> different times (from 1 to 10 measurements) for each individual.
>
> I want to perform the following operations:
>
> 1) Delete all individuals for whom only one measurement is available.
> In the dataset above, you can see that I want to delete the row Type B
> ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
> because there is more than one measurement for Type A ID 1 (but not
> for Type B ID1)
>
> 2) Produce difference scores for each of the Dates, so each individual
> (Type A ID1 and all the others for whom more than one measurement
> exists) starts at Date "1" and goes up in integers according to how
> many days have elapsed.
>
> I just know there's some incredibly cunning R-ish way of doing this
> but after many hours of fiddling I have had to admit defeat.
Not sure about terribly cunning. Let's assume your dataframe was read
in with stringsAsFactors=FALSE and is called txt.df:
> txt.df$dt2 <- as.Date(txt.df$Date, format="%d/%m/%Y")
> txt.df
Type ID Date Value dt2
1 A 1 16/09/2020 8 2020-09-16
2 A 1 23/09/2010 9 2010-09-23
3 B 3 18/8/2010 7 2010-08-18
4 B 1 13/5/2010 6 2010-05-13
> txt.df$nn <- ave(txt.df$ID,txt.df$ID, FUN=length)
> txt.df
Type ID Date Value dt2 nn
1 A 1 16/09/2020 8 2020-09-16 3
2 A 1 23/09/2010 9 2010-09-23 3
3 B 3 18/8/2010 7 2010-08-18 1
4 B 1 13/5/2010 6 2010-05-13 3
> txt.df[ -which( txt.df$nn <=1), ]
Type ID Date Value dt2 nn
1 A 1 16/09/2020 8 2020-09-16 3
2 A 1 23/09/2010 9 2010-09-23 3
4 B 1 13/5/2010 6 2010-05-13 3
# Task #1 accomplished
> tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x)
$`1`
Time differences in days
[1] 0 3646 3779
$`3`
Time difference of 0 days
> unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) )
11 12 13 3
0 3646 3779 0
> txt.df$diffdays <- unlist( tapply(txt.df$dt2, txt.df$ID,
function(x) x[1] -x) )
> txt.df
Type ID Date Value dt2 nn diffdays
1 A 1 16/09/2020 8 2020-09-16 3 0
2 A 1 23/09/2010 9 2010-09-23 3 3646
3 B 3 18/8/2010 7 2010-08-18 1 3779
4 B 1 13/5/2010 6 2010-05-13 3 0
>
>
> I would be very grateful for any words of advice.
>
> Many thanks,
> Chris Beeley,
> Institute of Mental Health, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list