[R] problem with "unique" function

peter dalgaard pdalgd at gmail.com
Fri Jul 28 18:15:18 CEST 2017


Most likely, previous computations have ended up giving slightly different values of say 0.13333. A pragmatic way out is to round to, say, 5 digits before applying unique. In this particular case, it seems like all numbers are multiples of 1/30, so another idea could be to multiply by 30, round, and divide by 30.

-pd

> On 28 Jul 2017, at 17:17 , li li <hannah.hlx at gmail.com> wrote:
> 
> I have the joint distribution of three discrete random variables z1, z2 and
> z3 which is captured by "z"
> and "prob" as described below.
> 
> For example, the probability for z1=0.46667, z2=-1 and z3=-1 is 2.752e-13.
> Also, the probability adds up to 1.
> 
>> head(z)           z1      z2      z3
> [1,] -0.46667 -1.0000 -1.0000
> [2,] -0.33333 -0.9333 -0.9333
> [3,] -0.20000 -0.8667 -0.8667
> [4,] -0.06667 -0.8000 -0.8000
> [5,]  0.06667 -0.7333 -0.7333
> [6,]  0.20000 -0.6667 -0.6667> prob[1:5][1] 2.752e-13 3.210e-12
> 1.348e-11 2.656e-11 2.656e-11> sum(prob)[1] 1
> 
> 
> I want to put the distribution into a joint probability table. I use the
> following code. But the probability no longer adds up to 1.
> 
>> z1 <- sort(unique(z[,1])); z2 <- sort(unique(z[,2]));  z3 <- sort(unique(z[,3]))> P <- array(0, dim=c(length(z1),length(z2),length(z3)), dimnames=list(A=z1, H=z2, M=z3))> > for (i in 1:(dim(z)[1])){+     ind <- z[i,]+     P[dimnames(P)$A==ind[1], dimnames(P)$H==ind[2], dimnames(P)$M==ind[3]] <- prob[i]     + }> sum(P)[1] 37.5
> 
> 
> The problem is when we look z1 as below, there are lot of repeated values.
> 
>> unique(z[,1]) [1] -0.46667 -0.33333 -0.20000 -0.06667  0.06667  0.20000  0.33333  0.46667 -0.53333 -0.40000
> [11] -0.26667 -0.13333  0.00000  0.13333  0.26667  0.40000  0.53333
> -0.60000 -0.06667  0.06667
> [21]  0.60000 -0.66667 -0.13333  0.13333  0.66667 -0.73333 -0.60000
> -0.20000  0.20000  0.60000
> [31]  0.73333 -0.80000 -0.53333 -0.26667  0.26667  0.53333  0.80000
> -0.86667 -0.46667 -0.33333
> [41]  0.33333  0.46667  0.86667 -0.93333 -0.40000  0.40000  0.93333
> -1.00000 -0.46667 -0.33333
> [51]  0.33333  0.46667  1.00000 -0.53333 -0.26667  0.26667  0.53333
> -0.20000  0.20000 -0.66667
> [61] -0.13333  0.13333  0.66667 -0.73333 -0.06667  0.06667  0.73333
> -0.20000  0.20000 -0.13333
> [71]  0.13333 -0.06667  0.06667 -0.06667  0.06667
> 
> 
> 
> Is there a way to fix this? Any idea and suggestions? Thanks very much!!
> 
>   Hanna
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list