[R] Manipulating DataSets
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu May 29 22:08:48 CEST 2008
Neil Gupta wrote:
> Hello R-Users,
>
> I am new to R and trying my best however I need help with this simple task.
> I have a dataset, YM1207.
> X.Symbol Date Time Exchange TickType
> ReferenceNumber Price Size
> 12491 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85985770 13379 7
> 12492 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85985771 13380 4
> 12493 3:YMZ7.EC 12/03/2007 08:32:50 EC T
> 85985845 13379 1
> 12494 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85985846 13379 7
> 12495 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85985847 13380 4
> 12496 3:YMZ7.EC 12/03/2007 08:32:50 EC B
> 85986222 13379 6
> 12497 3:YMZ7.EC 12/03/2007 08:32:50 EC A
> 85986223 13380 4
>
> I want to insert a column called NPrice which takes a pair of B,A and
> calculates its average Price. And than input that number in the B row and A
> row in the new column NPrice. Each B, A is seperated by +1 on the Reference
> Number. I want to skip T entries. T's do not come inbetween corresponding Bs
> and As. The other columns are not of interest. I would really appreciate it
> if I can get some help on this or refer me to a source that may.
>
>
I think this is a case where what you really need to do is to become
aware of the tools you have in the toolbox. E.g., I already showed you
one way to do it if the T's were absent:
N <- nrow(YM1207)
ix <- gl(N/2,2)
YM1207$NPrice <- ave(YM1207$price, ix)
(OK, I forgot $price last time...)
so how about making them disappear using
isAB <- YM1207$TickType %in% c("A","B)]
ABprice <- YM1207$price[ix]
then do as before
N <- length(ABprice)
ix <- gl(N/2,2)
NPrice <- ave(YM1207$price, ix)
and put it back using
YM1207$NPrice <- NA
YM1207$NPrice[isAB] <- NPrice
There are several ways to do this sort of thing. Another variation,
closer to your original suggestion would be to do
isA <- YM1207$TickType == "A"
isB <- YM1207$TickType == "B"
nPrice <- (YM1207$price[isA]+YM1207$price[isB])/2
YM1207$NPrice <- NA
YM1207$NPrice[isA] <- YM1207$NPrice[isB] <- nPrice
(you probably don't really need the NA assignment, but strange things
can happen when you make subassignments into non-existing columns)
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list