[R] problem of data manipulation

Bert Gunter gunter.berton at gene.com
Mon Jan 18 20:54:02 CET 2010


One way to do it:

1. Convert your date column to the Date class using the as.Date() function.
This allows you to do the necessary arithmetic on the dates below.
dt <- as.Date(a[,4],"%d/%m/%Y")

2. Create a factor out of your first three columns whose levels are in the
same order as the unique rows. Something likes the following should do it:
fac <- do.call(paste,a[,-4])
fac <- factor(fac, levels=unique(fac))

This allows you to choose the groups of rows whose dates you wish to compare
and maintain their correct order in the data frame

3. Then use tapply: 
a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),]

(unlist is needed to remove the list structure and concatenate the logical
indices to obtain the subscripting vector).

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of rusers.sh
Sent: Monday, January 18, 2010 10:40 AM
To: r-help at r-project.org
Subject: [R] problem of data manipulation

Hello,
  See my problem below.
a<-data.frame(c("s","c","c","n","n","n"),c(rep(1,3),rep(2,3)),c(rep(2,3),rep
(1,3)),c("01/01/1999","10/02/2000","13/02/2000","11/02/2000","15/02/2000","2
3/02/2000"))
colnames(a)<-c("var1","var2","var3","var4")
> a
  var1 var2 var3       var4
1    s    1    2    01/01/1999
2    c    1    2    10/02/2000
3    c    1    2    13/02/2000
4    n    2    1    11/02/2000
5    n    2    1    15/02/2000
6    n    2    1    23/02/2000

  I want to select the observations whose difference of "var4" is less than
7 for the cases with the same values of var1,var2 andvar3.
  The obervations have the same var1, var2 and var3 are, part1 (obs2 and
obs3) and part2 (obs4,obs5, and obs6).
  For obs2 and obs3, their date difference is less than 7, so we donot need
to delete any of them.
  For obs4,obs5, and obs6,we can see that obs6 should be deleted becuase its
date is over 7 dyas longer than obs4.
  So the final dataset should obs1,obs2,obs3,obs4, and obs5.
  I have a lot of observations in my dataset, so i hope to do this
automatically.  Any ideas on this?
  Thanks.
-- 
-----------------
Jane Chang
Queen's

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list