[R] two difficult loop

Tue Jun 14 14:04:58 CEST 2016

Thanks a lot Jim. I am struggling to solve the problem.I do appreciate for
your helps and advice.

Greg

On Tue, Jun 14, 2016 at 3:39 AM, Jim Lemon <drjimlemon at gmail.com> wrote:

> Hi Greg,
> This is obviously a problem with the data. The first error indicates
> that the sequence of integers regrange[1]:regrange[2] is too long to
> be allocated. Most likely this is because one or both of the endpoints
> are infinite. Maybe if you can find where the NAs are you can fix it.
>
> Jim
>
> On Tue, Jun 14, 2016 at 12:29 AM, greg holly <mak.hholly at gmail.com> wrote:
> > Hi Jim;
> >
> > I do apologize if bothering. I have run on the real data and here is the
> > error message I got:
> >
> > Thanks,
> >
> > Greg
> >
> > start  end
> > Error in regrange[1]:regrange[2] : result would be too long a vector
> > In addition: Warning messages:
> > 1: In min(x) : no non-missing arguments to min; returning Inf
> > 2: In max(x) : no non-missing arguments to max; returning -Inf
> >
> > On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hholly at gmail.com>
> wrote:
> >>
> >> Hi Jim;
> >>
> >> I do apologize if bothering. I have run on the real data and here is the
> >> error message I got:
> >>
> >> Thanks,
> >>
> >> Greg
> >>
> >> start  end
> >> Error in regrange[1]:regrange[2] : result would be too long a vector
> >> In addition: Warning messages:
> >> 1: In min(x) : no non-missing arguments to min; returning Inf
> >> 2: In max(x) : no non-missing arguments to max; returning -Inf
> >>
> >>
> >> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimlemon at gmail.com>
> wrote:
> >>>
> >>> Hi Greg,
> >>> Okay, I have a better idea now of what you want. The problem of
> >>> multiple matches is still there, but here is a start:
> >>>
> >>> # this data frame actually contains all the values in ref in the "reg"
> >>> field
> >>> map<-read.table(text="reg p rate
> >>>  10276 0.700  3.867e-18
> >>>  71608 0.830  4.542e-16
> >>>  29220 0.430  1.948e-15
> >>>  99542 0.220  1.084e-15
> >>>  26441 0.880  9.675e-14
> >>>  95082 0.090  7.349e-13
> >>>  36169 0.480  9.715e-13
> >>>  55572 0.500  9.071e-12
> >>>  65255 0.300  1.688e-11
> >>>  51960 0.970  1.163e-10
> >>>  55652 0.388  3.750e-10
> >>>  63933 0.250  9.128e-10
> >>>  35170 0.720  7.355e-09
> >>>  06491 0.370  1.634e-08
> >>>  85508 0.470  1.057e-07
> >>>  86666 0.580  7.862e-07
> >>>  04758 0.810  9.501e-07
> >>>  06169 0.440  1.104e-06
> >>>  63933 0.750  2.624e-06
> >>>  41838 0.960  8.119e-06
> >>>  74806 0.810  9.501e-07
> >>>  92643 0.470  1.057e-07
> >>>  73732 0.090  7.349e-13
> >>>  82451 0.960  8.119e-06
> >>>  86042 0.480  9.715e-13
> >>>  93502 0.500  9.071e-12
> >>>  85508 0.370  1.634e-08
> >>>  95082 0.830  4.542e-16",
> >>>  header=TRUE)
> >>> # same as in your example
> >>> ref<-read.table(text="reg1 reg2
> >>>  29220     63933
> >>>  26441     41838
> >>>  06169     10276
> >>>  74806     92643
> >>>  73732     82451
> >>>  86042     93502
> >>>  85508     95082",
> >>>  header=TRUE)
> >>> # sort the "map" data frame
> >>> map2<-map[order(map$reg),]
> >>> # get a field for the counts
> >>> ref$n<-NA
> >>> # and a field for the minimum p values
> >>> ref$min_p<-NA
> >>> # get the number of rows in "ref"
> >>> nref<-dim(ref)[1]
> >>> for(i in 1:nref) {
> >>>  start<-which(map2$reg==ref$reg1[i])
> >>>  end<-which(map2$reg==ref$reg2[i])
> >>>  cat("start",start,"end",end,"\n")
> >>>  # get the range of matches
> >>>  regrange<-range(c(start,end))
> >>>  # convert this to a sequence spanning all matches
> >>>  allreg<-regrange[1]:regrange[2]
> >>>  ref$n[i]<-sum(map2$p[allreg] > 0.85)
> >>>  ref$min_p[i]<-min(map2$p[allreg])
> >>> }
> >>>
> >>> This example uses the span from the first match of "reg1" to the last
> >>> match of "reg2". This may not be what you want, so let me know if
> >>> there are further constraints.
> >>>
> >>> Jim
> >>>
> >>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com>
> >>> wrote:
> >>> > Hi Bert;
> >>> >
> >>> > I do appreciate for this. I need check your codes on task2 tomorrow
> at
> >>> > my
> >>> > office on the real data as I have difficulty (because a technical
> >>> > issue) to
> >>> > remote connection. I am sure it will work well.
> >>> >
> >>> > I am sorry that I was not able to explain my first question.
> Basically
> >>> >
> >>> > Values in ref data represent the region of chromosome. I need choose
> >>> > these
> >>> > regions in map (all regions values in ref data are exist in map data
> in
> >>> > the
> >>> > first column -column map$reg). And then summing up the column
> "map$rate
> >>> > and
> >>> > count the numbers that gives >0.85. For example, consider  the first
> >>> > row in
> >>> > data ref. They are 29220   and  63933. After sorting the first column
> >>> > in
> >>> > map then summing column "map$rate" only between 29220   to  63933 in
> >>> > sorted
> >>> > map and cut off at >0.85. Then count how many rows in sorted map
> gives
> >>> >>0.85. For example consider there are 38 rows between 29220   in
> 63933
> >>> >> in sorted
> >>> > map$reg and only summing first 12 of them  gives>0.85. Then my answer
> >>> > is
> >>> > going to be 12 for 29220   -  63933 in ref.
> >>> >
> >>> > Thanks I lot for your patience.
> >>> >
> >>> > Cheers,
> >>> > Greg
> >>> >
> >>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Hi Bert;
> >>> >>
> >>> >> I do appreciate for this. I need check your codes on task2 tomorrow
> at
> >>> >> my
> >>> >> office on the real data as I have difficulty (because a technical
> >>> >> issue) to
> >>> >> remote connection. I am sure it will work well.
> >>> >>
> >>> >> I am sorry that I was not able to explain my first question.
> Basically
> >>> >>
> >>> >> Values in ref data represent the region of chromosome. I need choose
> >>> >> these
> >>> >> regions in map (all regions values in ref data are exist in map data
> >>> >> in the
> >>> >> first column -column map$reg). And then summing up the column
> >>> >> "map$rate and
> >>> >> count the numbers that gives >0.85. For example, consider  the first
> >>> >> row in
> >>> >> data ref. They are 29220   and  63933. After sorting the first
> column
> >>> >> in
> >>> >> map then summing column "map$rate" only between 29220   to  63933 in
> >>> >> sorted map and cut off at >0.85. Then count how many rows in sorted
> >>> >> map
> >>> >> gives >0.85. For example consider there are 38 rows between 29220
>  in
> >>> >>  63933 in sorted map$reg and only summing first 12 of them
> >>> >> gives>0.85.
> >>> >> Then my answer is going to be 12 for 29220   -  63933 in ref.
> >>> >>
> >>> >> Thanks I lot for your patience.
> >>> >>
> >>> >> Cheers,
> >>> >> Greg
> >>> >>
> >>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <
> bgunter.4567 at gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Greg:
> >>> >>>
> >>> >>> I was not able to understand your task 1. Perhaps others can.
> >>> >>>
> >>> >>> My understanding of your task 2 is that for each row of ref, you
> wish
> >>> >>> to find all rows,of map such that the reg values in those rows fall
> >>> >>> between the reg1 and reg2 values in ref (inclusive change <= to <
> if
> >>> >>> you don't want the endpoints), and then you want the minimum map$p
> >>> >>> values of all those rows. If that is correct, I believe this will
> do
> >>> >>> it (but caution, untested, as you failed to provide data in a
> >>> >>> convenient form, e.g. using dput() )
> >>> >>>
> >>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
> >>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
> >>> >>>
> >>> >>>
> >>> >>> If my understanding is incorrect, please ignore both the above and
> >>> >>> the
> >>> >>> following:
> >>> >>>
> >>> >>>
> >>> >>> The "solution" I have given above seems inefficient, so others may
> be
> >>> >>> able to significantly improve it if you find that it takes too
> long.
> >>> >>> OTOH, my understanding of your specification is that you need to
> >>> >>> search for all rows in map data frame that meet the criterion for
> >>> >>> each
> >>> >>> row of ref, and without further information, I don't know how to do
> >>> >>> this without just repeating the search 560 times.
> >>> >>>
> >>> >>>
> >>> >>> Cheers,
> >>> >>> Bert
> >>> >>>
> >>> >>>
> >>> >>> Bert Gunter
> >>> >>>
> >>> >>> "The trouble with having an open mind is that people keep coming
> >>> >>> along
> >>> >>> and sticking things into it."
> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>> >>>
> >>> >>>
> >>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com>
> >>> >>> wrote:
> >>> >>> > Dear all;
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > I have two data sets, data=map and data=ref). A small part of
> each
> >>> >>> > data
> >>> >>> set
> >>> >>> > are given below. Data map has more than 27 million and data ref
> has
> >>> >>> about
> >>> >>> > 560 rows. Basically I need run two different task. My R codes for
> >>> >>> > these
> >>> >>> > task are given below but they do not work properly.
> >>> >>> >
> >>> >>> > I sincerely do appreciate your helps.
> >>> >>> >
> >>> >>> >
> >>> >>> > Regards,
> >>> >>> >
> >>> >>> > Greg
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Task 1)
> >>> >>> >
> >>> >>> > For example, the first and second columns for row 1 in data ref
> are
> >>> >>> 29220
> >>> >>> > 63933. So I need write an R code normally first look the first
> row
> >>> >>> > in
> >>> >>> ref
> >>> >>> > (which they are 29220 and 63933) than summing the column of
> >>> >>> > "map$rate"
> >>> >>> and
> >>> >>> > give the number of rows that >0.85. Then do the same for the
> >>> >>> > second,
> >>> >>> > third....in ref. At the end I would like a table gave below (the
> >>> >>> results I
> >>> >>> > need). Please notice the all value specified in ref data file are
> >>> >>> > exist
> >>> >>> in
> >>> >>> > map$reg column.
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Task2)
> >>> >>> >
> >>> >>> > Again example, the first and second columns for row 1 in data ref
> >>> >>> > are
> >>> >>> 29220
> >>> >>> > 63933. So I need write an R code give the minimum map$p for the
> >>> >>> > 29220
> >>> >>> > -63933 intervals in map file. Than
> >>> >>> >
> >>> >>> > do the same for the second, third....in ref.
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > #my attempt for the first question
> >>> >>> >
> >>> >>> > temp<-map[order(map$reg, map$p),]
> >>> >>> >
> >>> >>> > count<-1
> >>> >>> >
> >>> >>> > temp<-unique(temp$reg
> >>> >>> >
> >>> >>> > for(i in 1:length(ref) {
> >>> >>> >
> >>> >>> >   for(j in 1:length(ref)
> >>> >>> >
> >>> >>> >   {
> >>> >>> >
> >>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] &
> >>> >>> > (temp[pos[j]==ref[ref$reg2,]
> >>> >>> > & temp[cumsum(temp$rate)
> >>> >>> >>0.70,])
> >>> >>> >
> >>> >>> > count=count+1
> >>> >>> >
> >>> >>> >     }
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> > #my attempt for the second question
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > temp<-map[order(map$reg, map$p),]
> >>> >>> >
> >>> >>> > count<-1
> >>> >>> >
> >>> >>> > temp<-unique(temp$reg
> >>> >>> >
> >>> >>> > for(i in 1:length(ref) {
> >>> >>> >
> >>> >>> >   for(j in 1:length(ref)
> >>> >>> >
> >>> >>> >   {
> >>> >>> >
> >>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] &
> >>> >>> > (temp[pos[j]==ref[ref$reg2,])
> >>> >>> >
> >>> >>> > output<-temp2[temp2$p==min(temp2$p),]
> >>> >>> >
> >>> >>> >     }
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Data sets
> >>> >>> >
> >>> >>> >
> >>> >>> >   Data= map
> >>> >>> >
> >>> >>> >   reg   p      rate
> >>> >>> >
> >>> >>> >  10276 0.700  3.867e-18
> >>> >>> >
> >>> >>> >  71608 0.830  4.542e-16
> >>> >>> >
> >>> >>> >  29220 0.430  1.948e-15
> >>> >>> >
> >>> >>> >  99542 0.220  1.084e-15
> >>> >>> >
> >>> >>> >  26441 0.880  9.675e-14
> >>> >>> >
> >>> >>> >  95082 0.090  7.349e-13
> >>> >>> >
> >>> >>> >  36169 0.480  9.715e-13
> >>> >>> >
> >>> >>> >  55572 0.500  9.071e-12
> >>> >>> >
> >>> >>> >  65255 0.300  1.688e-11
> >>> >>> >
> >>> >>> >  51960 0.970  1.163e-10
> >>> >>> >
> >>> >>> >  55652 0.388  3.750e-10
> >>> >>> >
> >>> >>> >  63933 0.250  9.128e-10
> >>> >>> >
> >>> >>> >  35170 0.720  7.355e-09
> >>> >>> >
> >>> >>> >  06491 0.370  1.634e-08
> >>> >>> >
> >>> >>> >  85508 0.470  1.057e-07
> >>> >>> >
> >>> >>> >  86666 0.580  7.862e-07
> >>> >>> >
> >>> >>> >  04758 0.810  9.501e-07
> >>> >>> >
> >>> >>> >  06169 0.440  1.104e-06
> >>> >>> >
> >>> >>> >  63933 0.750  2.624e-06
> >>> >>> >
> >>> >>> >  41838 0.960  8.119e-06
> >>> >>> >
> >>> >>> >
> >>> >>> >  data=ref
> >>> >>> >
> >>> >>> >   reg1         reg2
> >>> >>> >
> >>> >>> >   29220     63933
> >>> >>> >
> >>> >>> >   26441     41838
> >>> >>> >
> >>> >>> >   06169     10276
> >>> >>> >
> >>> >>> >   74806     92643
> >>> >>> >
> >>> >>> >   73732     82451
> >>> >>> >
> >>> >>> >   86042     93502
> >>> >>> >
> >>> >>> >   85508     95082
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >        the results I need
> >>> >>> >
> >>> >>> >      reg1      reg2 n
> >>> >>> >
> >>> >>> >    29220   63933  12
> >>> >>> >
> >>> >>> >    26441   41838   78
> >>> >>> >
> >>> >>> >    06169 10276  125
> >>> >>> >
> >>> >>> >    74806 92643   11
> >>> >>> >
> >>> >>> >    73732 82451   47
> >>> >>> >
> >>> >>> >    86042 93502   98
> >>> >>> >
> >>> >>> >    85508 95082  219
> >>> >>> >
> >>> >>> >         [[alternative HTML version deleted]]
> >>> >>> >
> >>> >>> > ______________________________________________
> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> > PLEASE do read the posting guide
> >>> >>> http://www.R-project.org/posting-guide.html
> >>> >>> > and provide commented, minimal, self-contained, reproducible
> code.
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>> >         [[alternative HTML version deleted]]
> >>> >
> >>> > ______________________________________________
> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> > http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
>

	[[alternative HTML version deleted]]