[R] two difficult loop
greg holly
mak.hholly at gmail.com
Mon Jun 13 04:35:36 CEST 2016
Hi Bert;
I do appreciate for this. I need check your codes on task2 tomorrow at my
office on the real data as I have difficulty (because a technical issue) to
remote connection. I am sure it will work well.
I am sorry that I was not able to explain my first question. Basically
Values in ref data represent the region of chromosome. I need choose these
regions in map (all regions values in ref data are exist in map data in the
first column -column map$reg). And then summing up the column "map$rate and
count the numbers that gives >0.85. For example, consider the first row in
data ref. They are 29220 and 63933. After sorting the first column in
map then summing column "map$rate" only between 29220 to 63933 in sorted
map and cut off at >0.85. Then count how many rows in sorted map gives
>0.85. For example consider there are 38 rows between 29220 in 63933 in sorted
map$reg and only summing first 12 of them gives>0.85. Then my answer is
going to be 12 for 29220 - 63933 in ref.
Thanks I lot for your patience.
Cheers,
Greg
On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com> wrote:
> Hi Bert;
>
> I do appreciate for this. I need check your codes on task2 tomorrow at my
> office on the real data as I have difficulty (because a technical issue) to
> remote connection. I am sure it will work well.
>
> I am sorry that I was not able to explain my first question. Basically
>
> Values in ref data represent the region of chromosome. I need choose these
> regions in map (all regions values in ref data are exist in map data in the
> first column -column map$reg). And then summing up the column "map$rate and
> count the numbers that gives >0.85. For example, consider the first row in
> data ref. They are 29220 and 63933. After sorting the first column in
> map then summing column "map$rate" only between 29220 to 63933 in
> sorted map and cut off at >0.85. Then count how many rows in sorted map
> gives >0.85. For example consider there are 38 rows between 29220 in
> 63933 in sorted map$reg and only summing first 12 of them gives>0.85.
> Then my answer is going to be 12 for 29220 - 63933 in ref.
>
> Thanks I lot for your patience.
>
> Cheers,
> Greg
>
> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
>
>> Greg:
>>
>> I was not able to understand your task 1. Perhaps others can.
>>
>> My understanding of your task 2 is that for each row of ref, you wish
>> to find all rows,of map such that the reg values in those rows fall
>> between the reg1 and reg2 values in ref (inclusive change <= to < if
>> you don't want the endpoints), and then you want the minimum map$p
>> values of all those rows. If that is correct, I believe this will do
>> it (but caution, untested, as you failed to provide data in a
>> convenient form, e.g. using dput() )
>>
>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
>>
>>
>> If my understanding is incorrect, please ignore both the above and the
>> following:
>>
>>
>> The "solution" I have given above seems inefficient, so others may be
>> able to significantly improve it if you find that it takes too long.
>> OTOH, my understanding of your specification is that you need to
>> search for all rows in map data frame that meet the criterion for each
>> row of ref, and without further information, I don't know how to do
>> this without just repeating the search 560 times.
>>
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com> wrote:
>> > Dear all;
>> >
>> >
>> >
>> > I have two data sets, data=map and data=ref). A small part of each data
>> set
>> > are given below. Data map has more than 27 million and data ref has
>> about
>> > 560 rows. Basically I need run two different task. My R codes for these
>> > task are given below but they do not work properly.
>> >
>> > I sincerely do appreciate your helps.
>> >
>> >
>> > Regards,
>> >
>> > Greg
>> >
>> >
>> >
>> > Task 1)
>> >
>> > For example, the first and second columns for row 1 in data ref are
>> 29220
>> > 63933. So I need write an R code normally first look the first row in
>> ref
>> > (which they are 29220 and 63933) than summing the column of "map$rate"
>> and
>> > give the number of rows that >0.85. Then do the same for the second,
>> > third....in ref. At the end I would like a table gave below (the
>> results I
>> > need). Please notice the all value specified in ref data file are exist
>> in
>> > map$reg column.
>> >
>> >
>> >
>> > Task2)
>> >
>> > Again example, the first and second columns for row 1 in data ref are
>> 29220
>> > 63933. So I need write an R code give the minimum map$p for the 29220
>> > -63933 intervals in map file. Than
>> >
>> > do the same for the second, third....in ref.
>> >
>> >
>> >
>> >
>> > #my attempt for the first question
>> >
>> > temp<-map[order(map$reg, map$p),]
>> >
>> > count<-1
>> >
>> > temp<-unique(temp$reg
>> >
>> > for(i in 1:length(ref) {
>> >
>> > for(j in 1:length(ref)
>> >
>> > {
>> >
>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]
>> > & temp[cumsum(temp$rate)
>> >>0.70,])
>> >
>> > count=count+1
>> >
>> > }
>> >
>> > }
>> >
>> > #my attempt for the second question
>> >
>> >
>> >
>> > temp<-map[order(map$reg, map$p),]
>> >
>> > count<-1
>> >
>> > temp<-unique(temp$reg
>> >
>> > for(i in 1:length(ref) {
>> >
>> > for(j in 1:length(ref)
>> >
>> > {
>> >
>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])
>> >
>> > output<-temp2[temp2$p==min(temp2$p),]
>> >
>> > }
>> >
>> > }
>> >
>> >
>> >
>> > Data sets
>> >
>> >
>> > Data= map
>> >
>> > reg p rate
>> >
>> > 10276 0.700 3.867e-18
>> >
>> > 71608 0.830 4.542e-16
>> >
>> > 29220 0.430 1.948e-15
>> >
>> > 99542 0.220 1.084e-15
>> >
>> > 26441 0.880 9.675e-14
>> >
>> > 95082 0.090 7.349e-13
>> >
>> > 36169 0.480 9.715e-13
>> >
>> > 55572 0.500 9.071e-12
>> >
>> > 65255 0.300 1.688e-11
>> >
>> > 51960 0.970 1.163e-10
>> >
>> > 55652 0.388 3.750e-10
>> >
>> > 63933 0.250 9.128e-10
>> >
>> > 35170 0.720 7.355e-09
>> >
>> > 06491 0.370 1.634e-08
>> >
>> > 85508 0.470 1.057e-07
>> >
>> > 86666 0.580 7.862e-07
>> >
>> > 04758 0.810 9.501e-07
>> >
>> > 06169 0.440 1.104e-06
>> >
>> > 63933 0.750 2.624e-06
>> >
>> > 41838 0.960 8.119e-06
>> >
>> >
>> > data=ref
>> >
>> > reg1 reg2
>> >
>> > 29220 63933
>> >
>> > 26441 41838
>> >
>> > 06169 10276
>> >
>> > 74806 92643
>> >
>> > 73732 82451
>> >
>> > 86042 93502
>> >
>> > 85508 95082
>> >
>> >
>> >
>> > the results I need
>> >
>> > reg1 reg2 n
>> >
>> > 29220 63933 12
>> >
>> > 26441 41838 78
>> >
>> > 06169 10276 125
>> >
>> > 74806 92643 11
>> >
>> > 73732 82451 47
>> >
>> > 86042 93502 98
>> >
>> > 85508 95082 219
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list