[R] two difficult loop
greg holly
mak.hholly at gmail.com
Tue Jun 14 14:04:58 CEST 2016
Thanks a lot Jim. I am struggling to solve the problem.I do appreciate for
your helps and advice.
Greg
On Tue, Jun 14, 2016 at 3:39 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Greg,
> This is obviously a problem with the data. The first error indicates
> that the sequence of integers regrange[1]:regrange[2] is too long to
> be allocated. Most likely this is because one or both of the endpoints
> are infinite. Maybe if you can find where the NAs are you can fix it.
>
> Jim
>
> On Tue, Jun 14, 2016 at 12:29 AM, greg holly <mak.hholly at gmail.com> wrote:
> > Hi Jim;
> >
> > I do apologize if bothering. I have run on the real data and here is the
> > error message I got:
> >
> > Thanks,
> >
> > Greg
> >
> > start end
> > Error in regrange[1]:regrange[2] : result would be too long a vector
> > In addition: Warning messages:
> > 1: In min(x) : no non-missing arguments to min; returning Inf
> > 2: In max(x) : no non-missing arguments to max; returning -Inf
> >
> > On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hholly at gmail.com>
> wrote:
> >>
> >> Hi Jim;
> >>
> >> I do apologize if bothering. I have run on the real data and here is the
> >> error message I got:
> >>
> >> Thanks,
> >>
> >> Greg
> >>
> >> start end
> >> Error in regrange[1]:regrange[2] : result would be too long a vector
> >> In addition: Warning messages:
> >> 1: In min(x) : no non-missing arguments to min; returning Inf
> >> 2: In max(x) : no non-missing arguments to max; returning -Inf
> >>
> >>
> >> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimlemon at gmail.com>
> wrote:
> >>>
> >>> Hi Greg,
> >>> Okay, I have a better idea now of what you want. The problem of
> >>> multiple matches is still there, but here is a start:
> >>>
> >>> # this data frame actually contains all the values in ref in the "reg"
> >>> field
> >>> map<-read.table(text="reg p rate
> >>> 10276 0.700 3.867e-18
> >>> 71608 0.830 4.542e-16
> >>> 29220 0.430 1.948e-15
> >>> 99542 0.220 1.084e-15
> >>> 26441 0.880 9.675e-14
> >>> 95082 0.090 7.349e-13
> >>> 36169 0.480 9.715e-13
> >>> 55572 0.500 9.071e-12
> >>> 65255 0.300 1.688e-11
> >>> 51960 0.970 1.163e-10
> >>> 55652 0.388 3.750e-10
> >>> 63933 0.250 9.128e-10
> >>> 35170 0.720 7.355e-09
> >>> 06491 0.370 1.634e-08
> >>> 85508 0.470 1.057e-07
> >>> 86666 0.580 7.862e-07
> >>> 04758 0.810 9.501e-07
> >>> 06169 0.440 1.104e-06
> >>> 63933 0.750 2.624e-06
> >>> 41838 0.960 8.119e-06
> >>> 74806 0.810 9.501e-07
> >>> 92643 0.470 1.057e-07
> >>> 73732 0.090 7.349e-13
> >>> 82451 0.960 8.119e-06
> >>> 86042 0.480 9.715e-13
> >>> 93502 0.500 9.071e-12
> >>> 85508 0.370 1.634e-08
> >>> 95082 0.830 4.542e-16",
> >>> header=TRUE)
> >>> # same as in your example
> >>> ref<-read.table(text="reg1 reg2
> >>> 29220 63933
> >>> 26441 41838
> >>> 06169 10276
> >>> 74806 92643
> >>> 73732 82451
> >>> 86042 93502
> >>> 85508 95082",
> >>> header=TRUE)
> >>> # sort the "map" data frame
> >>> map2<-map[order(map$reg),]
> >>> # get a field for the counts
> >>> ref$n<-NA
> >>> # and a field for the minimum p values
> >>> ref$min_p<-NA
> >>> # get the number of rows in "ref"
> >>> nref<-dim(ref)[1]
> >>> for(i in 1:nref) {
> >>> start<-which(map2$reg==ref$reg1[i])
> >>> end<-which(map2$reg==ref$reg2[i])
> >>> cat("start",start,"end",end,"\n")
> >>> # get the range of matches
> >>> regrange<-range(c(start,end))
> >>> # convert this to a sequence spanning all matches
> >>> allreg<-regrange[1]:regrange[2]
> >>> ref$n[i]<-sum(map2$p[allreg] > 0.85)
> >>> ref$min_p[i]<-min(map2$p[allreg])
> >>> }
> >>>
> >>> This example uses the span from the first match of "reg1" to the last
> >>> match of "reg2". This may not be what you want, so let me know if
> >>> there are further constraints.
> >>>
> >>> Jim
> >>>
> >>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com>
> >>> wrote:
> >>> > Hi Bert;
> >>> >
> >>> > I do appreciate for this. I need check your codes on task2 tomorrow
> at
> >>> > my
> >>> > office on the real data as I have difficulty (because a technical
> >>> > issue) to
> >>> > remote connection. I am sure it will work well.
> >>> >
> >>> > I am sorry that I was not able to explain my first question.
> Basically
> >>> >
> >>> > Values in ref data represent the region of chromosome. I need choose
> >>> > these
> >>> > regions in map (all regions values in ref data are exist in map data
> in
> >>> > the
> >>> > first column -column map$reg). And then summing up the column
> "map$rate
> >>> > and
> >>> > count the numbers that gives >0.85. For example, consider the first
> >>> > row in
> >>> > data ref. They are 29220 and 63933. After sorting the first column
> >>> > in
> >>> > map then summing column "map$rate" only between 29220 to 63933 in
> >>> > sorted
> >>> > map and cut off at >0.85. Then count how many rows in sorted map
> gives
> >>> >>0.85. For example consider there are 38 rows between 29220 in
> 63933
> >>> >> in sorted
> >>> > map$reg and only summing first 12 of them gives>0.85. Then my answer
> >>> > is
> >>> > going to be 12 for 29220 - 63933 in ref.
> >>> >
> >>> > Thanks I lot for your patience.
> >>> >
> >>> > Cheers,
> >>> > Greg
> >>> >
> >>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Hi Bert;
> >>> >>
> >>> >> I do appreciate for this. I need check your codes on task2 tomorrow
> at
> >>> >> my
> >>> >> office on the real data as I have difficulty (because a technical
> >>> >> issue) to
> >>> >> remote connection. I am sure it will work well.
> >>> >>
> >>> >> I am sorry that I was not able to explain my first question.
> Basically
> >>> >>
> >>> >> Values in ref data represent the region of chromosome. I need choose
> >>> >> these
> >>> >> regions in map (all regions values in ref data are exist in map data
> >>> >> in the
> >>> >> first column -column map$reg). And then summing up the column
> >>> >> "map$rate and
> >>> >> count the numbers that gives >0.85. For example, consider the first
> >>> >> row in
> >>> >> data ref. They are 29220 and 63933. After sorting the first
> column
> >>> >> in
> >>> >> map then summing column "map$rate" only between 29220 to 63933 in
> >>> >> sorted map and cut off at >0.85. Then count how many rows in sorted
> >>> >> map
> >>> >> gives >0.85. For example consider there are 38 rows between 29220
> in
> >>> >> 63933 in sorted map$reg and only summing first 12 of them
> >>> >> gives>0.85.
> >>> >> Then my answer is going to be 12 for 29220 - 63933 in ref.
> >>> >>
> >>> >> Thanks I lot for your patience.
> >>> >>
> >>> >> Cheers,
> >>> >> Greg
> >>> >>
> >>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <
> bgunter.4567 at gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Greg:
> >>> >>>
> >>> >>> I was not able to understand your task 1. Perhaps others can.
> >>> >>>
> >>> >>> My understanding of your task 2 is that for each row of ref, you
> wish
> >>> >>> to find all rows,of map such that the reg values in those rows fall
> >>> >>> between the reg1 and reg2 values in ref (inclusive change <= to <
> if
> >>> >>> you don't want the endpoints), and then you want the minimum map$p
> >>> >>> values of all those rows. If that is correct, I believe this will
> do
> >>> >>> it (but caution, untested, as you failed to provide data in a
> >>> >>> convenient form, e.g. using dput() )
> >>> >>>
> >>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
> >>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
> >>> >>>
> >>> >>>
> >>> >>> If my understanding is incorrect, please ignore both the above and
> >>> >>> the
> >>> >>> following:
> >>> >>>
> >>> >>>
> >>> >>> The "solution" I have given above seems inefficient, so others may
> be
> >>> >>> able to significantly improve it if you find that it takes too
> long.
> >>> >>> OTOH, my understanding of your specification is that you need to
> >>> >>> search for all rows in map data frame that meet the criterion for
> >>> >>> each
> >>> >>> row of ref, and without further information, I don't know how to do
> >>> >>> this without just repeating the search 560 times.
> >>> >>>
> >>> >>>
> >>> >>> Cheers,
> >>> >>> Bert
> >>> >>>
> >>> >>>
> >>> >>> Bert Gunter
> >>> >>>
> >>> >>> "The trouble with having an open mind is that people keep coming
> >>> >>> along
> >>> >>> and sticking things into it."
> >>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>> >>>
> >>> >>>
> >>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com>
> >>> >>> wrote:
> >>> >>> > Dear all;
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > I have two data sets, data=map and data=ref). A small part of
> each
> >>> >>> > data
> >>> >>> set
> >>> >>> > are given below. Data map has more than 27 million and data ref
> has
> >>> >>> about
> >>> >>> > 560 rows. Basically I need run two different task. My R codes for
> >>> >>> > these
> >>> >>> > task are given below but they do not work properly.
> >>> >>> >
> >>> >>> > I sincerely do appreciate your helps.
> >>> >>> >
> >>> >>> >
> >>> >>> > Regards,
> >>> >>> >
> >>> >>> > Greg
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Task 1)
> >>> >>> >
> >>> >>> > For example, the first and second columns for row 1 in data ref
> are
> >>> >>> 29220
> >>> >>> > 63933. So I need write an R code normally first look the first
> row
> >>> >>> > in
> >>> >>> ref
> >>> >>> > (which they are 29220 and 63933) than summing the column of
> >>> >>> > "map$rate"
> >>> >>> and
> >>> >>> > give the number of rows that >0.85. Then do the same for the
> >>> >>> > second,
> >>> >>> > third....in ref. At the end I would like a table gave below (the
> >>> >>> results I
> >>> >>> > need). Please notice the all value specified in ref data file are
> >>> >>> > exist
> >>> >>> in
> >>> >>> > map$reg column.
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Task2)
> >>> >>> >
> >>> >>> > Again example, the first and second columns for row 1 in data ref
> >>> >>> > are
> >>> >>> 29220
> >>> >>> > 63933. So I need write an R code give the minimum map$p for the
> >>> >>> > 29220
> >>> >>> > -63933 intervals in map file. Than
> >>> >>> >
> >>> >>> > do the same for the second, third....in ref.
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > #my attempt for the first question
> >>> >>> >
> >>> >>> > temp<-map[order(map$reg, map$p),]
> >>> >>> >
> >>> >>> > count<-1
> >>> >>> >
> >>> >>> > temp<-unique(temp$reg
> >>> >>> >
> >>> >>> > for(i in 1:length(ref) {
> >>> >>> >
> >>> >>> > for(j in 1:length(ref)
> >>> >>> >
> >>> >>> > {
> >>> >>> >
> >>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] &
> >>> >>> > (temp[pos[j]==ref[ref$reg2,]
> >>> >>> > & temp[cumsum(temp$rate)
> >>> >>> >>0.70,])
> >>> >>> >
> >>> >>> > count=count+1
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> > #my attempt for the second question
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > temp<-map[order(map$reg, map$p),]
> >>> >>> >
> >>> >>> > count<-1
> >>> >>> >
> >>> >>> > temp<-unique(temp$reg
> >>> >>> >
> >>> >>> > for(i in 1:length(ref) {
> >>> >>> >
> >>> >>> > for(j in 1:length(ref)
> >>> >>> >
> >>> >>> > {
> >>> >>> >
> >>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] &
> >>> >>> > (temp[pos[j]==ref[ref$reg2,])
> >>> >>> >
> >>> >>> > output<-temp2[temp2$p==min(temp2$p),]
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> > }
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > Data sets
> >>> >>> >
> >>> >>> >
> >>> >>> > Data= map
> >>> >>> >
> >>> >>> > reg p rate
> >>> >>> >
> >>> >>> > 10276 0.700 3.867e-18
> >>> >>> >
> >>> >>> > 71608 0.830 4.542e-16
> >>> >>> >
> >>> >>> > 29220 0.430 1.948e-15
> >>> >>> >
> >>> >>> > 99542 0.220 1.084e-15
> >>> >>> >
> >>> >>> > 26441 0.880 9.675e-14
> >>> >>> >
> >>> >>> > 95082 0.090 7.349e-13
> >>> >>> >
> >>> >>> > 36169 0.480 9.715e-13
> >>> >>> >
> >>> >>> > 55572 0.500 9.071e-12
> >>> >>> >
> >>> >>> > 65255 0.300 1.688e-11
> >>> >>> >
> >>> >>> > 51960 0.970 1.163e-10
> >>> >>> >
> >>> >>> > 55652 0.388 3.750e-10
> >>> >>> >
> >>> >>> > 63933 0.250 9.128e-10
> >>> >>> >
> >>> >>> > 35170 0.720 7.355e-09
> >>> >>> >
> >>> >>> > 06491 0.370 1.634e-08
> >>> >>> >
> >>> >>> > 85508 0.470 1.057e-07
> >>> >>> >
> >>> >>> > 86666 0.580 7.862e-07
> >>> >>> >
> >>> >>> > 04758 0.810 9.501e-07
> >>> >>> >
> >>> >>> > 06169 0.440 1.104e-06
> >>> >>> >
> >>> >>> > 63933 0.750 2.624e-06
> >>> >>> >
> >>> >>> > 41838 0.960 8.119e-06
> >>> >>> >
> >>> >>> >
> >>> >>> > data=ref
> >>> >>> >
> >>> >>> > reg1 reg2
> >>> >>> >
> >>> >>> > 29220 63933
> >>> >>> >
> >>> >>> > 26441 41838
> >>> >>> >
> >>> >>> > 06169 10276
> >>> >>> >
> >>> >>> > 74806 92643
> >>> >>> >
> >>> >>> > 73732 82451
> >>> >>> >
> >>> >>> > 86042 93502
> >>> >>> >
> >>> >>> > 85508 95082
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > the results I need
> >>> >>> >
> >>> >>> > reg1 reg2 n
> >>> >>> >
> >>> >>> > 29220 63933 12
> >>> >>> >
> >>> >>> > 26441 41838 78
> >>> >>> >
> >>> >>> > 06169 10276 125
> >>> >>> >
> >>> >>> > 74806 92643 11
> >>> >>> >
> >>> >>> > 73732 82451 47
> >>> >>> >
> >>> >>> > 86042 93502 98
> >>> >>> >
> >>> >>> > 85508 95082 219
> >>> >>> >
> >>> >>> > [[alternative HTML version deleted]]
> >>> >>> >
> >>> >>> > ______________________________________________
> >>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> > PLEASE do read the posting guide
> >>> >>> http://www.R-project.org/posting-guide.html
> >>> >>> > and provide commented, minimal, self-contained, reproducible
> code.
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>> > [[alternative HTML version deleted]]
> >>> >
> >>> > ______________________________________________
> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> > PLEASE do read the posting guide
> >>> > http://www.R-project.org/posting-guide.html
> >>> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list