[R] R-help Digest, Vol 30, Issue 22

Mon Aug 22 15:54:31 CEST 2005

 Re: A. Mani : Avoiding loops (Petr Pikal)
> Message: 9
> Date: Mon, 22 Aug 2005 06:40:45 +0200
> From: "Petr Pikal" <petr.pikal at precheza.cz>
> Subject: Re: [R] A. Mani : Avoiding loops
> To: "A. Mani" <a_mani_sc_gs at vsnl.net>, r-help
>  <r-help at stat.math.ethz.ch>
> 
> On 20 Aug 2005 at 3:26, A. Mani wrote:
> 
> > On Friday 19 August 2005 11:54, Sean O'Riordain wrote:
> > > Hi,
> > > I'm not sure what you actually want from your email (following the
> > > posting guide is a good way of helping you explain things to the
> > > rest of us in a way we understand - it might even answer your
> > > question!
> > >
> > > I'm only a beginner at R so no doubt one of our expert colleagues
> > > will help me...
> > >
> > > > fred <- data.frame()
> > > > fred <- edit(fred)
> > > > fred
> > >
> > >   A B C D E
> > > 1 1 2 X Y 1
> > > 2 2 3 G L 1
> > > 3 3 1 G L 5
> > >
> > > > fred[,3]
> > >
> > > [1] X G G
> > > Levels: G X
> > >
> > > > fred[fred[,3]=="G",]
> > >
> > >   A B C D E
> > > 2 2 3 G L 1
> > > 3 3 1 G L 5
> > >
> > > so at this point I can create a new dataframe with column 3 (C) ==
> > > "G"; either explicitly or implicitly...
> > >
> > > and if I want to calculate the sum() of column E, then I just say
> > > something like...
> > >
> > > > sum(fred[fred[,3]=="G",][,5])
> > >
> > > [1] 6
> > >
> > >
> > > now naturally being a bit clueless at manipulating stuff in R, I
> > > didn't know how to do this before I started... and you guys only get
> > > to see the lines that I typed in and got a "successful" result...
> > >
> > > according to section 6 of the "Introduction to R" manual which comes
> > > with R, I could also have said
> > >
> > > > sum(fred[fred$C=="G",]$E)
> > >
> > > [1] 6
> > >
> > > Hmmm.... I wonder would it be reasonable to put an example of this
> > > type into section 2.7 of the "Introduction to R"?
> > >
> > >
> > > cheers!
> > > Sean
> > >
> > > On 18/08/05, A. Mani <a_mani_sc_gs at vsnl.net> wrote:
> > > > Hello,
> > > >         I want to avoid loops in the following situation. There is
> > > >         a
> > > > 5-col dataframe with col headers alone. two of the columns are
> > > > non-numeric. The problem is to calculate statistics(scores) for
> > > > each element of one column. The functions depend on matching in
> > > > the other non-numeric column.
> > > >
> > > > A  B  C  E  F
> > > > 1  2  X  Y  1
> > > > 2  3  G  L  1
> > > > 3  1  G  L  5
> > > > and so on ...30000+ entries.
> > > >
> > > > I need scores for col E entries which depend on conditional
> > > > implications.
> > > >
> > > >
> > > > Thanks,
> > > >
> > Hello,
> >       Sorry about the incomplete problem. Here is a better version for
> >       the
> > problem: (the measure is not simple)
> > The data frame is like
> >   col1       col2            col3       col4        col5
> >   <num>  <nonum>   <nonum>      <num>   <num>
> >        A           B             C                  E           F  
> > There are repeated strings in col3, col2. Problem : Calculate 
> > Measure(Ci) = [No. of repeats of Ci *100] + [If (Bi, Ci) is same as
> > (Bj, Cj) and 6>= Ej - Ei >=3 then add 100 else  10] .
> 
> Hi
> 
> I am not sure what exactly you would like to compute, 
> **working** example could help. But if you want to do some 
> computation for row "i" which depends on row "j", I suppose that 
> you can not avoid loops. 
> 
> Generally you can use one of aggregate, tapply, by or ave for some 
> computation split by factor. See help pages.
> 
> tapply(vector or data frame, list(factors), function)
> 
> is the standard form.
> 
> HTH
> Petr
> 
> 
> > 
> > 
> > Actually it is to stretched further by adding similar blocks.
> > 
> >  How do we use *apply or
> > something else in the situation  ?
> > 
> > 
> > In prolog it is extremely easy, but here it is not quite...
> > 
> > 
Here is some code and a little data 

dat <- read.table("/home/project5R/datasplf.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", strip.white=TRUE)
attach(dat)
showData(dat, placement='-20+200', font=.logFont, maxwidth=80, maxheight=30)
x <- as.matrix(dat)
x1 <- as.vector(x[,1])
xd1 <- as.Date(x1, format= "%m-%d-%Y")
n <- length(x1)
n
x2 <- as.vector(x[,2])
length(x2)
x3 <- as.vector(x[,3])
length(x3)
x4 <- as.vector(x[,4])
x5 <- as.vector(x[,5])
x5[is.na(x5)] <- 0
xd4 <- as.Date(x4, format= "%m-%d-%Y")
xd4
p6 <- (1-(abs(x5 - 6)/6))*100
p6
xd1 <- as.Date(x1, format= "%m-%d-%Y")
xd1
x23 <- cbind(x2,x3)
xp <- paste(x2,x3)
xp
y <- cbind(x23,xd4,xd1,xp)

_____________________________________________________________
#The Score to be computed is for the doctors. It is no. of patients *100 + rate
of decrease of diabetic score *1000 + no.of tests at approx 3 months *....(see
below )  

_____________________________________________________________  
# To be debugged (loops)

sc <- vector(n, mode = "numeric")
for (i in 1:n){for(j in 1:n) {If identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]])}
sc[[i]] <- sc[[i]] + 100 else sc[[i]] <- sc[[i]] +0 }
sc
scf <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]]) &
abs(1-(abs(xd4[[i]]-xd4[[j]]))/90) <= 1.25)} scf[[i]] <- scf[[i]] +
100 else scf[[i]] <- scf[i] +0

scr <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) & identical(x2[[i]],x2[[j]])}
scr[[i]] <- ((abs(x5[[i]]-x5[[j]]))/(abs(xd4[[i]]-xd4[[j]]))) *1000 + scr[[i]] 

sce <- vector(0, length= n, mode = "numeric", step=0)
for (i in 1:n) {sce[[i]] <- sce[[i]] + (1 - abs(x5[[i]]- 6)/6)*100}

se <- scf + sce + scr + sc

score <- cbind(x3, se)

____________________________
DATA
"DOB","ID","DOCTOR","DATE of TEST","TEST1"
12-23-1921,2177532.174,NA,01-20-2003,NA
NA,2358368.261,"152N7R",01-26-2003,NA
NA,2358368.261,"152N7R",01-27-2003,NA
07-24-1938,2174903.913,NA,01-31-2003,6.7
12-25-1924,2185493.043,NA,01-31-2003,NA
07-21-1943,2181658.696,"K9PL9N,L",01-28-2003,7
05-24-1938,2306571.304,"SH7RM9N",01-13-2003,NA
07-29-1949,2296516.522,"H3001FR9",01-20-2003,NA

Thanks,

 A. Mani
 Member, Cal. Math. Soc