[R] Manipulating DataSets

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu May 29 22:08:48 CEST 2008


Neil Gupta wrote:
> Hello R-Users,
>
> I am new to R and trying my best however I need help with this simple task.
> I have a dataset, YM1207.
>  X.Symbol                     Date Time Exchange         TickType
> ReferenceNumber Price Size
> 12491  3:YMZ7.EC 12/03/2007 08:32:50       EC        B
> 85985770                13379    7
> 12492  3:YMZ7.EC 12/03/2007 08:32:50       EC        A
> 85985771                13380    4
> 12493  3:YMZ7.EC 12/03/2007 08:32:50       EC        T
> 85985845                13379    1
> 12494  3:YMZ7.EC 12/03/2007 08:32:50       EC        B
> 85985846                13379    7
> 12495  3:YMZ7.EC 12/03/2007 08:32:50       EC        A
> 85985847                13380    4
> 12496  3:YMZ7.EC 12/03/2007 08:32:50       EC        B
> 85986222                13379    6
> 12497  3:YMZ7.EC 12/03/2007 08:32:50       EC        A
> 85986223                13380    4
>
> I want to insert a column called NPrice which takes a pair of B,A and
> calculates its average Price. And than input that number in the B row and A
> row in the new column NPrice. Each B, A is seperated by +1 on the Reference
> Number. I want to skip T entries. T's do not come inbetween corresponding Bs
> and As. The other columns are not of interest. I would really appreciate it
> if I can get some help on this or refer me to a source that may.
>
>   
I think this is a case where what you really need to do is to become 
aware of the tools you have in the toolbox. E.g., I already showed you 
one way to do it if the T's were absent:

N <- nrow(YM1207)
ix <- gl(N/2,2)
YM1207$NPrice <- ave(YM1207$price, ix)

(OK, I forgot $price last time...)

so how about making them disappear using

isAB <- YM1207$TickType %in% c("A","B)]
ABprice <- YM1207$price[ix]

then do as before

N <- length(ABprice)
ix <- gl(N/2,2)
NPrice <- ave(YM1207$price, ix)

and put it back using

YM1207$NPrice <- NA
YM1207$NPrice[isAB] <- NPrice

There are several ways to do this sort of thing. Another variation, 
closer to your original suggestion would be to do

isA <- YM1207$TickType == "A"
isB <- YM1207$TickType == "B"
nPrice <- (YM1207$price[isA]+YM1207$price[isB])/2
YM1207$NPrice <- NA
YM1207$NPrice[isA] <- YM1207$NPrice[isB] <- nPrice

(you probably don't really need the NA assignment, but strange things 
can happen when you make subassignments into non-existing columns)

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list