[R] two difficult loop

greg holly mak.hholly at gmail.com
Mon Jun 13 04:35:36 CEST 2016


Hi Bert;

I do appreciate for this. I need check your codes on task2 tomorrow at my
office on the real data as I have difficulty (because a technical issue) to
remote connection. I am sure it will work well.

I am sorry that I was not able to explain my first question. Basically

Values in ref data represent the region of chromosome. I need choose these
regions in map (all regions values in ref data are exist in map data in the
first column -column map$reg). And then summing up the column "map$rate and
count the numbers that gives >0.85. For example, consider  the first row in
data ref. They are 29220   and  63933. After sorting the first column in
map then summing column "map$rate" only between 29220   to  63933 in sorted
map and cut off at >0.85. Then count how many rows in sorted map gives
>0.85. For example consider there are 38 rows between 29220   in  63933 in sorted
map$reg and only summing first 12 of them  gives>0.85. Then my answer is
going to be 12 for 29220   -  63933 in ref.

Thanks I lot for your patience.

Cheers,
Greg

On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com> wrote:

> Hi Bert;
>
> I do appreciate for this. I need check your codes on task2 tomorrow at my
> office on the real data as I have difficulty (because a technical issue) to
> remote connection. I am sure it will work well.
>
> I am sorry that I was not able to explain my first question. Basically
>
> Values in ref data represent the region of chromosome. I need choose these
> regions in map (all regions values in ref data are exist in map data in the
> first column -column map$reg). And then summing up the column "map$rate and
> count the numbers that gives >0.85. For example, consider  the first row in
> data ref. They are 29220   and  63933. After sorting the first column in
> map then summing column "map$rate" only between 29220   to  63933 in
> sorted map and cut off at >0.85. Then count how many rows in sorted map
> gives >0.85. For example consider there are 38 rows between 29220   in
>  63933 in sorted map$reg and only summing first 12 of them  gives>0.85.
> Then my answer is going to be 12 for 29220   -  63933 in ref.
>
> Thanks I lot for your patience.
>
> Cheers,
> Greg
>
> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
>
>> Greg:
>>
>> I was not able to understand your task 1. Perhaps others can.
>>
>> My understanding of your task 2 is that for each row of ref, you wish
>> to find all rows,of map such that the reg values in those rows fall
>> between the reg1 and reg2 values in ref (inclusive change <= to < if
>> you don't want the endpoints), and then you want the minimum map$p
>> values of all those rows. If that is correct, I believe this will do
>> it (but caution, untested, as you failed to provide data in a
>> convenient form, e.g. using dput() )
>>
>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
>>
>>
>> If my understanding is incorrect, please ignore both the above and the
>> following:
>>
>>
>> The "solution" I have given above seems inefficient, so others may be
>> able to significantly improve it if you find that it takes too long.
>> OTOH, my understanding of your specification is that you need to
>> search for all rows in map data frame that meet the criterion for each
>> row of ref, and without further information, I don't know how to do
>> this without just repeating the search 560 times.
>>
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com> wrote:
>> > Dear all;
>> >
>> >
>> >
>> > I have two data sets, data=map and data=ref). A small part of each data
>> set
>> > are given below. Data map has more than 27 million and data ref has
>> about
>> > 560 rows. Basically I need run two different task. My R codes for these
>> > task are given below but they do not work properly.
>> >
>> > I sincerely do appreciate your helps.
>> >
>> >
>> > Regards,
>> >
>> > Greg
>> >
>> >
>> >
>> > Task 1)
>> >
>> > For example, the first and second columns for row 1 in data ref are
>> 29220
>> > 63933. So I need write an R code normally first look the first row in
>> ref
>> > (which they are 29220 and 63933) than summing the column of "map$rate"
>> and
>> > give the number of rows that >0.85. Then do the same for the second,
>> > third....in ref. At the end I would like a table gave below (the
>> results I
>> > need). Please notice the all value specified in ref data file are exist
>> in
>> > map$reg column.
>> >
>> >
>> >
>> > Task2)
>> >
>> > Again example, the first and second columns for row 1 in data ref are
>> 29220
>> > 63933. So I need write an R code give the minimum map$p for the 29220
>> > -63933 intervals in map file. Than
>> >
>> > do the same for the second, third....in ref.
>> >
>> >
>> >
>> >
>> > #my attempt for the first question
>> >
>> > temp<-map[order(map$reg, map$p),]
>> >
>> > count<-1
>> >
>> > temp<-unique(temp$reg
>> >
>> > for(i in 1:length(ref) {
>> >
>> >   for(j in 1:length(ref)
>> >
>> >   {
>> >
>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]
>> > & temp[cumsum(temp$rate)
>> >>0.70,])
>> >
>> > count=count+1
>> >
>> >     }
>> >
>> > }
>> >
>> > #my attempt for the second question
>> >
>> >
>> >
>> > temp<-map[order(map$reg, map$p),]
>> >
>> > count<-1
>> >
>> > temp<-unique(temp$reg
>> >
>> > for(i in 1:length(ref) {
>> >
>> >   for(j in 1:length(ref)
>> >
>> >   {
>> >
>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])
>> >
>> > output<-temp2[temp2$p==min(temp2$p),]
>> >
>> >     }
>> >
>> > }
>> >
>> >
>> >
>> > Data sets
>> >
>> >
>> >   Data= map
>> >
>> >   reg   p      rate
>> >
>> >  10276 0.700  3.867e-18
>> >
>> >  71608 0.830  4.542e-16
>> >
>> >  29220 0.430  1.948e-15
>> >
>> >  99542 0.220  1.084e-15
>> >
>> >  26441 0.880  9.675e-14
>> >
>> >  95082 0.090  7.349e-13
>> >
>> >  36169 0.480  9.715e-13
>> >
>> >  55572 0.500  9.071e-12
>> >
>> >  65255 0.300  1.688e-11
>> >
>> >  51960 0.970  1.163e-10
>> >
>> >  55652 0.388  3.750e-10
>> >
>> >  63933 0.250  9.128e-10
>> >
>> >  35170 0.720  7.355e-09
>> >
>> >  06491 0.370  1.634e-08
>> >
>> >  85508 0.470  1.057e-07
>> >
>> >  86666 0.580  7.862e-07
>> >
>> >  04758 0.810  9.501e-07
>> >
>> >  06169 0.440  1.104e-06
>> >
>> >  63933 0.750  2.624e-06
>> >
>> >  41838 0.960  8.119e-06
>> >
>> >
>> >  data=ref
>> >
>> >   reg1         reg2
>> >
>> >   29220     63933
>> >
>> >   26441     41838
>> >
>> >   06169     10276
>> >
>> >   74806     92643
>> >
>> >   73732     82451
>> >
>> >   86042     93502
>> >
>> >   85508     95082
>> >
>> >
>> >
>> >        the results I need
>> >
>> >      reg1      reg2 n
>> >
>> >    29220   63933  12
>> >
>> >    26441   41838   78
>> >
>> >    06169 10276  125
>> >
>> >    74806 92643   11
>> >
>> >    73732 82451   47
>> >
>> >    86042 93502   98
>> >
>> >    85508 95082  219
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list