[R] Countvariable for id by date
jim holtman
jholtman at gmail.com
Thu Aug 9 13:53:52 CEST 2007
This should do what you want:
> x <- read.table(textConnection("id;dg1;dg2;date;
+ 1;F28;;1997-11-04;
+ 1;F20;F702;1998-11-09;
+ 1;F20;;1997-12-03;
+ 1;F208;;2001-03-18;
+ 2;F32;;1999-03-07;
+ 2;F29;F32;2000-01-06;
+ 2;F32;;2003-07-05;
+ 2;F323;F2800;2000-02-05;"), header=TRUE, sep=";", as.is=TRUE)
> # convert dates
> x$dateP <- unclass(as.POSIXct(x$date))
> # matches for F20
> F20 <- grep("F20", paste(x$dg1, x$dg2))
> # matches for F21 - F29
> F21 <- grep("F2[1-9]", paste(x$dg1, x$dg2))
> # grouping
> x$F20 <- x$F21 <- NA
> x$F20[F20] <- rank(x$dateP[F20])
> x$F21[F21] <- rank(x$dateP[F21])
> x
id dg1 dg2 date X dateP F21 F20
1 1 F28 1997-11-04 NA 878601600 1 NA
2 1 F20 F702 1998-11-09 NA 910569600 NA 2
3 1 F20 1997-12-03 NA 881107200 NA 1
4 1 F208 2001-03-18 NA 984873600 NA 3
5 2 F32 1999-03-07 NA 920764800 NA NA
6 2 F29 F32 2000-01-06 NA 947116800 2 NA
7 2 F32 2003-07-05 NA 1057363200 NA NA
8 2 F323 F2800 2000-02-05 NA 949708800 3 NA
On 8/9/07, David Gyllenberg <david.gyllenberg at yahoo.com> wrote:
> Best R-users,
>
> Here's a newbie question. I have tried to find an answer to this via help and the "ave(x,factor(),FUN=function(y) rank (z,tie='first')"-function, but without success.
>
> I have a dataframe (~8000 observations, registerdata) with four columns: id, dg1, dg2 and date(YYYY-MM-DD) of interest:
>
> id;dg1;dg2;date;
> 1;F28;;1997-11-04;
> 1;F20;F702;1998-11-09;
> 1;F20;;1997-12-03;
> 1;F208;;2001-03-18;
> 2;F32;;1999-03-07;
> 2;F29;F32;2000-01-06;
> 2;F32;;2003-07-05;
> 2;F323;F2800;2000-02-05;
> ...
>
> I would like o have two additional columns:
> 1. "countF20": a "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F20* OR dg2 = F20*,
> where * means F201,F202... F2001,F2002...F20001,F20002...
> 2. "countF2129": another "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F21*-F29* OR dg2 = F21*-F29*,
> where F21*-F29* means F21*, F22*...F29* and
> where * means F211,F212... F2101,F2102...F21001,F21002...
>
> ... so the dataframe would look like this, where 1 is the first observation for the id with the right condition, 2 is the second etc.:
>
> id;dg1;dg2;date;countF20;countF2129;
> 1;F28;;1997-11-04;;1;
> 1;F20;F702;1998-11-09;2;;
> 1;F20;;1997-12-03;1;;
> 1;F208;;2001-03-18;3;;
> 2;F32;;1999-03-07;;;
> 2;F29;F32;2000-01-06;;1;
> 2;F32;;2003-07-05;;;
> 2;F323;F2800;2000-02-05;;2;
> ...
>
> Do you know a convenient way to create these kind of "countvariables"? Thank you in advance!
>
> / David (david.gyllenberg at yahoo.com
>
>
> ---------------------------------
> Park yourself in front of a world of choices in alternative vehicles.
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list