[R] two difficult loop
greg holly
mak.hholly at gmail.com
Mon Jun 13 16:29:07 CEST 2016
Hi Jim;
I do apologize if bothering. I have run on the real data and here is the
error message I got:
Thanks,
Greg
start end
Error in regrange[1]:regrange[2] : result would be too long a vector
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hholly at gmail.com> wrote:
> Hi Jim;
>
> I do apologize if bothering. I have run on the real data and here is the
> error message I got:
>
> Thanks,
>
> Greg
>
> start end
> Error in regrange[1]:regrange[2] : result would be too long a vector
> In addition: Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
>
>
> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>
>> Hi Greg,
>> Okay, I have a better idea now of what you want. The problem of
>> multiple matches is still there, but here is a start:
>>
>> # this data frame actually contains all the values in ref in the "reg"
>> field
>> map<-read.table(text="reg p rate
>> 10276 0.700 3.867e-18
>> 71608 0.830 4.542e-16
>> 29220 0.430 1.948e-15
>> 99542 0.220 1.084e-15
>> 26441 0.880 9.675e-14
>> 95082 0.090 7.349e-13
>> 36169 0.480 9.715e-13
>> 55572 0.500 9.071e-12
>> 65255 0.300 1.688e-11
>> 51960 0.970 1.163e-10
>> 55652 0.388 3.750e-10
>> 63933 0.250 9.128e-10
>> 35170 0.720 7.355e-09
>> 06491 0.370 1.634e-08
>> 85508 0.470 1.057e-07
>> 86666 0.580 7.862e-07
>> 04758 0.810 9.501e-07
>> 06169 0.440 1.104e-06
>> 63933 0.750 2.624e-06
>> 41838 0.960 8.119e-06
>> 74806 0.810 9.501e-07
>> 92643 0.470 1.057e-07
>> 73732 0.090 7.349e-13
>> 82451 0.960 8.119e-06
>> 86042 0.480 9.715e-13
>> 93502 0.500 9.071e-12
>> 85508 0.370 1.634e-08
>> 95082 0.830 4.542e-16",
>> header=TRUE)
>> # same as in your example
>> ref<-read.table(text="reg1 reg2
>> 29220 63933
>> 26441 41838
>> 06169 10276
>> 74806 92643
>> 73732 82451
>> 86042 93502
>> 85508 95082",
>> header=TRUE)
>> # sort the "map" data frame
>> map2<-map[order(map$reg),]
>> # get a field for the counts
>> ref$n<-NA
>> # and a field for the minimum p values
>> ref$min_p<-NA
>> # get the number of rows in "ref"
>> nref<-dim(ref)[1]
>> for(i in 1:nref) {
>> start<-which(map2$reg==ref$reg1[i])
>> end<-which(map2$reg==ref$reg2[i])
>> cat("start",start,"end",end,"\n")
>> # get the range of matches
>> regrange<-range(c(start,end))
>> # convert this to a sequence spanning all matches
>> allreg<-regrange[1]:regrange[2]
>> ref$n[i]<-sum(map2$p[allreg] > 0.85)
>> ref$min_p[i]<-min(map2$p[allreg])
>> }
>>
>> This example uses the span from the first match of "reg1" to the last
>> match of "reg2". This may not be what you want, so let me know if
>> there are further constraints.
>>
>> Jim
>>
>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> > Hi Bert;
>> >
>> > I do appreciate for this. I need check your codes on task2 tomorrow at
>> my
>> > office on the real data as I have difficulty (because a technical
>> issue) to
>> > remote connection. I am sure it will work well.
>> >
>> > I am sorry that I was not able to explain my first question. Basically
>> >
>> > Values in ref data represent the region of chromosome. I need choose
>> these
>> > regions in map (all regions values in ref data are exist in map data in
>> the
>> > first column -column map$reg). And then summing up the column "map$rate
>> and
>> > count the numbers that gives >0.85. For example, consider the first
>> row in
>> > data ref. They are 29220 and 63933. After sorting the first column in
>> > map then summing column "map$rate" only between 29220 to 63933 in
>> sorted
>> > map and cut off at >0.85. Then count how many rows in sorted map gives
>> >>0.85. For example consider there are 38 rows between 29220 in 63933
>> in sorted
>> > map$reg and only summing first 12 of them gives>0.85. Then my answer is
>> > going to be 12 for 29220 - 63933 in ref.
>> >
>> > Thanks I lot for your patience.
>> >
>> > Cheers,
>> > Greg
>> >
>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> >
>> >> Hi Bert;
>> >>
>> >> I do appreciate for this. I need check your codes on task2 tomorrow at
>> my
>> >> office on the real data as I have difficulty (because a technical
>> issue) to
>> >> remote connection. I am sure it will work well.
>> >>
>> >> I am sorry that I was not able to explain my first question. Basically
>> >>
>> >> Values in ref data represent the region of chromosome. I need choose
>> these
>> >> regions in map (all regions values in ref data are exist in map data
>> in the
>> >> first column -column map$reg). And then summing up the column
>> "map$rate and
>> >> count the numbers that gives >0.85. For example, consider the first
>> row in
>> >> data ref. They are 29220 and 63933. After sorting the first column
>> in
>> >> map then summing column "map$rate" only between 29220 to 63933 in
>> >> sorted map and cut off at >0.85. Then count how many rows in sorted map
>> >> gives >0.85. For example consider there are 38 rows between 29220 in
>> >> 63933 in sorted map$reg and only summing first 12 of them gives>0.85.
>> >> Then my answer is going to be 12 for 29220 - 63933 in ref.
>> >>
>> >> Thanks I lot for your patience.
>> >>
>> >> Cheers,
>> >> Greg
>> >>
>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4567 at gmail.com>
>> >> wrote:
>> >>
>> >>> Greg:
>> >>>
>> >>> I was not able to understand your task 1. Perhaps others can.
>> >>>
>> >>> My understanding of your task 2 is that for each row of ref, you wish
>> >>> to find all rows,of map such that the reg values in those rows fall
>> >>> between the reg1 and reg2 values in ref (inclusive change <= to < if
>> >>> you don't want the endpoints), and then you want the minimum map$p
>> >>> values of all those rows. If that is correct, I believe this will do
>> >>> it (but caution, untested, as you failed to provide data in a
>> >>> convenient form, e.g. using dput() )
>> >>>
>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
>> >>>
>> >>>
>> >>> If my understanding is incorrect, please ignore both the above and the
>> >>> following:
>> >>>
>> >>>
>> >>> The "solution" I have given above seems inefficient, so others may be
>> >>> able to significantly improve it if you find that it takes too long.
>> >>> OTOH, my understanding of your specification is that you need to
>> >>> search for all rows in map data frame that meet the criterion for each
>> >>> row of ref, and without further information, I don't know how to do
>> >>> this without just repeating the search 560 times.
>> >>>
>> >>>
>> >>> Cheers,
>> >>> Bert
>> >>>
>> >>>
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that people keep coming along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>
>> >>>
>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> >>> > Dear all;
>> >>> >
>> >>> >
>> >>> >
>> >>> > I have two data sets, data=map and data=ref). A small part of each
>> data
>> >>> set
>> >>> > are given below. Data map has more than 27 million and data ref has
>> >>> about
>> >>> > 560 rows. Basically I need run two different task. My R codes for
>> these
>> >>> > task are given below but they do not work properly.
>> >>> >
>> >>> > I sincerely do appreciate your helps.
>> >>> >
>> >>> >
>> >>> > Regards,
>> >>> >
>> >>> > Greg
>> >>> >
>> >>> >
>> >>> >
>> >>> > Task 1)
>> >>> >
>> >>> > For example, the first and second columns for row 1 in data ref are
>> >>> 29220
>> >>> > 63933. So I need write an R code normally first look the first row
>> in
>> >>> ref
>> >>> > (which they are 29220 and 63933) than summing the column of
>> "map$rate"
>> >>> and
>> >>> > give the number of rows that >0.85. Then do the same for the second,
>> >>> > third....in ref. At the end I would like a table gave below (the
>> >>> results I
>> >>> > need). Please notice the all value specified in ref data file are
>> exist
>> >>> in
>> >>> > map$reg column.
>> >>> >
>> >>> >
>> >>> >
>> >>> > Task2)
>> >>> >
>> >>> > Again example, the first and second columns for row 1 in data ref
>> are
>> >>> 29220
>> >>> > 63933. So I need write an R code give the minimum map$p for the
>> 29220
>> >>> > -63933 intervals in map file. Than
>> >>> >
>> >>> > do the same for the second, third....in ref.
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > #my attempt for the first question
>> >>> >
>> >>> > temp<-map[order(map$reg, map$p),]
>> >>> >
>> >>> > count<-1
>> >>> >
>> >>> > temp<-unique(temp$reg
>> >>> >
>> >>> > for(i in 1:length(ref) {
>> >>> >
>> >>> > for(j in 1:length(ref)
>> >>> >
>> >>> > {
>> >>> >
>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] &
>> (temp[pos[j]==ref[ref$reg2,]
>> >>> > & temp[cumsum(temp$rate)
>> >>> >>0.70,])
>> >>> >
>> >>> > count=count+1
>> >>> >
>> >>> > }
>> >>> >
>> >>> > }
>> >>> >
>> >>> > #my attempt for the second question
>> >>> >
>> >>> >
>> >>> >
>> >>> > temp<-map[order(map$reg, map$p),]
>> >>> >
>> >>> > count<-1
>> >>> >
>> >>> > temp<-unique(temp$reg
>> >>> >
>> >>> > for(i in 1:length(ref) {
>> >>> >
>> >>> > for(j in 1:length(ref)
>> >>> >
>> >>> > {
>> >>> >
>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] &
>> (temp[pos[j]==ref[ref$reg2,])
>> >>> >
>> >>> > output<-temp2[temp2$p==min(temp2$p),]
>> >>> >
>> >>> > }
>> >>> >
>> >>> > }
>> >>> >
>> >>> >
>> >>> >
>> >>> > Data sets
>> >>> >
>> >>> >
>> >>> > Data= map
>> >>> >
>> >>> > reg p rate
>> >>> >
>> >>> > 10276 0.700 3.867e-18
>> >>> >
>> >>> > 71608 0.830 4.542e-16
>> >>> >
>> >>> > 29220 0.430 1.948e-15
>> >>> >
>> >>> > 99542 0.220 1.084e-15
>> >>> >
>> >>> > 26441 0.880 9.675e-14
>> >>> >
>> >>> > 95082 0.090 7.349e-13
>> >>> >
>> >>> > 36169 0.480 9.715e-13
>> >>> >
>> >>> > 55572 0.500 9.071e-12
>> >>> >
>> >>> > 65255 0.300 1.688e-11
>> >>> >
>> >>> > 51960 0.970 1.163e-10
>> >>> >
>> >>> > 55652 0.388 3.750e-10
>> >>> >
>> >>> > 63933 0.250 9.128e-10
>> >>> >
>> >>> > 35170 0.720 7.355e-09
>> >>> >
>> >>> > 06491 0.370 1.634e-08
>> >>> >
>> >>> > 85508 0.470 1.057e-07
>> >>> >
>> >>> > 86666 0.580 7.862e-07
>> >>> >
>> >>> > 04758 0.810 9.501e-07
>> >>> >
>> >>> > 06169 0.440 1.104e-06
>> >>> >
>> >>> > 63933 0.750 2.624e-06
>> >>> >
>> >>> > 41838 0.960 8.119e-06
>> >>> >
>> >>> >
>> >>> > data=ref
>> >>> >
>> >>> > reg1 reg2
>> >>> >
>> >>> > 29220 63933
>> >>> >
>> >>> > 26441 41838
>> >>> >
>> >>> > 06169 10276
>> >>> >
>> >>> > 74806 92643
>> >>> >
>> >>> > 73732 82451
>> >>> >
>> >>> > 86042 93502
>> >>> >
>> >>> > 85508 95082
>> >>> >
>> >>> >
>> >>> >
>> >>> > the results I need
>> >>> >
>> >>> > reg1 reg2 n
>> >>> >
>> >>> > 29220 63933 12
>> >>> >
>> >>> > 26441 41838 78
>> >>> >
>> >>> > 06169 10276 125
>> >>> >
>> >>> > 74806 92643 11
>> >>> >
>> >>> > 73732 82451 47
>> >>> >
>> >>> > 86042 93502 98
>> >>> >
>> >>> > 85508 95082 219
>> >>> >
>> >>> > [[alternative HTML version deleted]]
>> >>> >
>> >>> > ______________________________________________
>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> > PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> > and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >>
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list