[R] two difficult loop

Jim Lemon drjimlemon at gmail.com
Mon Jun 13 01:06:46 CEST 2016


Hi Greg,
You've got a problem that you don't seem to have identified. Your
"reg" field in the "map" data frame can define at most 100000 unique
values. This means that each value will be repeated about 270 times.
Unless there are constraints you haven't mentioned, we would expect
that in 135 cases for each value, the values in each "ref" row will be
in the reverse order and the spans may overlap. I notice that you may
have tried to get around this by sorting the "map" data frame, but
then the order of the rows is different, and the number of rows
"between" any two values changes. Apart from this, it is almost
certain that the number of values of "p > 0.85" in the multiple runs
between each set of "ref" values will be different. It is possible to
perform both tasks that you mention, but only the second will yield an
unique or tied value for all of the cases. So your result data frame
will have an unspecified number of values for each row in "ref" for
the first task.

Jim


On Mon, Jun 13, 2016 at 6:14 AM, greg holly <mak.hholly at gmail.com> wrote:
> Dear all;
>
>
>
> I have two data sets, data=map and data=ref). A small part of each data set
> are given below. Data map has more than 27 million and data ref has about
> 560 rows. Basically I need run two different task. My R codes for these
> task are given below but they do not work properly.
>
> I sincerely do appreciate your helps.
>
>
> Regards,
>
> Greg
>
>
>
> Task 1)
>
> For example, the first and second columns for row 1 in data ref are 29220
> 63933. So I need write an R code normally first look the first row in ref
> (which they are 29220 and 63933) than summing the column of "map$rate" and
> give the number of rows that >0.85. Then do the same for the second,
> third....in ref. At the end I would like a table gave below (the results I
> need). Please notice the all value specified in ref data file are exist in
> map$reg column.
>
>
>
> Task2)
>
> Again example, the first and second columns for row 1 in data ref are 29220
> 63933. So I need write an R code give the minimum map$p for the 29220
> -63933 intervals in map file. Than
>
> do the same for the second, third....in ref.
>
>
>
>
> #my attempt for the first question
>
> temp<-map[order(map$reg, map$p),]
>
> count<-1
>
> temp<-unique(temp$reg
>
> for(i in 1:length(ref) {
>
>   for(j in 1:length(ref)
>
>   {
>
> temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]
> & temp[cumsum(temp$rate)
>>0.70,])
>
> count=count+1
>
>     }
>
> }
>
> #my attempt for the second question
>
>
>
> temp<-map[order(map$reg, map$p),]
>
> count<-1
>
> temp<-unique(temp$reg
>
> for(i in 1:length(ref) {
>
>   for(j in 1:length(ref)
>
>   {
>
> temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])
>
> output<-temp2[temp2$p==min(temp2$p),]
>
>     }
>
> }
>
>
>
> Data sets
>
>
>   Data= map
>
>   reg   p      rate
>
>  10276 0.700  3.867e-18
>
>  71608 0.830  4.542e-16
>
>  29220 0.430  1.948e-15
>
>  99542 0.220  1.084e-15
>
>  26441 0.880  9.675e-14
>
>  95082 0.090  7.349e-13
>
>  36169 0.480  9.715e-13
>
>  55572 0.500  9.071e-12
>
>  65255 0.300  1.688e-11
>
>  51960 0.970  1.163e-10
>
>  55652 0.388  3.750e-10
>
>  63933 0.250  9.128e-10
>
>  35170 0.720  7.355e-09
>
>  06491 0.370  1.634e-08
>
>  85508 0.470  1.057e-07
>
>  86666 0.580  7.862e-07
>
>  04758 0.810  9.501e-07
>
>  06169 0.440  1.104e-06
>
>  63933 0.750  2.624e-06
>
>  41838 0.960  8.119e-06
>
>
>  data=ref
>
>   reg1         reg2
>
>   29220     63933
>
>   26441     41838
>
>   06169     10276
>
>   74806     92643
>
>   73732     82451
>
>   86042     93502
>
>   85508     95082
>
>
>
>        the results I need
>
>      reg1      reg2 n
>
>    29220   63933  12
>
>    26441   41838   78
>
>    06169 10276  125
>
>    74806 92643   11
>
>    73732 82451   47
>
>    86042 93502   98
>
>    85508 95082  219
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list