[R] Trying to make code more efficient
Brian Diggs
diggsb at ohsu.edu
Mon Jun 13 20:42:27 CEST 2011
On 6/9/2011 12:27 PM, Abraham Mathew wrote:
> I have a repetative task in R and i'm trying to find a more efficient way to
> perform
> the following task.
>
>
> lst<- list(roots = c("car insurance", "auto insurance"),
> roots2 = c("insurance"), prefix = c("cheap", "budget"),
> prefix2 = c("low cost"), suffix = c("quote", "quotes"),
> suffix2 = c("rate", "rates"), suffix3 = c("comparison"),
> state = c(state), inscompany = c(inscompany), city=c(city),
> cityst = c(cityst), agency=c(agency))
This is not reproducible since we don't have state, inscompany, etc.
> myone<- function(x, y) {
> m1<- do.call(paste, expand.grid(lst[[x]], lst[[y]]))
> mydf<- data.frame(keyword=c(m1))
> }
Your indentation threw me off for awhile before I realized that mytwo is
not nested inside myone.
> mytwo<- function(x, y, z){
> m2<- do.call(paste, expand.grid(lst[[x]], lst[[y]], lst[[z]]))
> mydf2<- data.frame(keyword=c(m2))
> }
Anytime you have many sequentially numbered somethings, that is an
indication you probably should be using a list (or possibly vector).
> d1 = mytwo("prefix", "roots", "suffix")
> d2 = mytwo("prefix", "roots", "suffix2")
> d3 = mytwo("prefix", "roots", "suffix3")
> d4 = mytwo("prefix2", "roots", "suffix")
> d5 = mytwo("prefix2", "roots", "suffix2")
> d6 = mytwo("prefix2", "roots", "suffix3")
> d7 = mytwo("prefix", "roots2", "suffix")
> d8 = mytwo("prefix", "roots2", "suffix2")
> d9 = mytwo("prefix", "roots2", "suffix3")
> d10 = mytwo("prefix2", "roots2", "suffix")
> d11 = mytwo("prefix2", "roots2", "suffix2")
> d12 = mytwo("prefix2", "roots2", "suffix3")
Well, these first 12 can be generated using mlply (from plyr) and
another expand.grid to get all the combinations.
d <- mlply(.data = expand.grid(x=c("prefix", "prefix2"),
y=c("roots", "roots2"),
z=c("suffix", "suffix2", "suffix3"),
stringsAsFactors=FALSE),
.fun = mytwo,
.expand = FALSE)
> d13 = myone("prefix", "roots")
> d14 = myone("prefix2", "roots")
> d15 = myone("prefix", "roots2")
> d16 = myone("prefix2", "roots2")
Another pattern of a full cross of two sets which are fed as arguments
to a function, so something similar to before.
> d17 = myone("roots", "suffix")
> d18 = myone("roots", "suffix2")
> d19 = myone("roots", "suffix3")
> d20 = myone("roots2", "suffix")
> d21 = myone("roots2", "suffix2")
> d22 = myone("roots2", "suffix3")
Trying to see bigger patterns. There is the set prefix/prefix2 (call it
set P), roots/root2 (call it set R), and suffix/suffix2/suffix3 (call it
set S). Pick two or three of these sets and, keeping them in order,
send all the crosses of the sets as arguments to a function (that takes
an appropriate number of arguments).
In fact, myone and mytwo could (probably) be replaced with
my <- function(...) {
data.frame(keyword=c(do.call(paste, do.call(expand.grid, lst[c(...)]))))
}
> d23 = myone("state", "roots")
> d24 = myone("city", "roots")
> d25 = myone("cityst", "roots")
> d26 = myone("inscompany", "roots")
> d27 = myone("state", "roots2")
> d28 = myone("city", "roots2")
> d29 = myone("cityst", "roots2")
> d30 = myone("inscompany", "roots2")
OK, need to broaden the pattern. Another set is
state/city/cityst/inscompany (call it set I). If thinking in an order,
it is before roots/roots2 (set R).
> d31 = mytwo("state", "roots", "suffix")
> d32 = mytwo("city", "roots", "suffix")
> d33 = mytwo("cityst", "roots", "suffix")
> d34 = mytwo("inscompany", "roots", "suffix")
> d35 = mytwo("state", "roots", "suffix2")
> d36 = mytwo("city", "roots", "suffix2")
> d37 = mytwo("cityst", "roots", "suffix2")
> d38 = mytwo("inscompany", "roots", "suffix2")
> d39 = mytwo("state", "roots", "suffix3")
> d40 = mytwo("city", "roots", "suffix3")
> d41 = mytwo("cityst", "roots", "suffix3")
> d42 = mytwo("inscompany", "roots", "suffix3")
> d43 = mytwo("state", "roots2", "suffix")
> d44 = mytwo("city", "roots2", "suffix")
> d45 = mytwo("cityst", "roots2", "suffix")
> d46 = mytwo("inscompany", "roots2", "suffix")
> d47 = mytwo("state", "roots2", "suffix2")
> d48 = mytwo("city", "roots2", "suffix2")
> d49 = mytwo("cityst", "roots2", "suffix2")
> d50 = mytwo("inscompany", "roots2", "suffix2")
> d51 = mytwo("state", "roots2", "suffix3")
> d52 = mytwo("city", "roots2", "suffix3")
> d53 = mytwo("cityst", "roots2", "suffix3")
> d54 = mytwo("inscompany", "roots2", "suffix3")
Three way between I/R/S
> d55 = mytwo("prefix", "state", "roots")
> d56 = mytwo("prefix", "city", "roots")
> d57 = mytwo("prefix", "cityst", "roots")
> d58 = mytwo("prefix", "inscompany", "roots")
> d59 = mytwo("prefix2", "state", "roots")
> d60 = mytwo("prefix2", "city", "roots")
> d61 = mytwo("prefix2", "cityst", "roots")
> d62 = mytwo("prefix2", "inscompany", "roots")
> d63 = mytwo("prefix", "state", "roots2")
> d64 = mytwo("prefix", "city", "roots2")
> d65 = mytwo("prefix", "cityst", "roots2")
> d66 = mytwo("prefix", "inscompany", "roots2")
> d67 = mytwo("prefix2", "state", "roots2")
> d68 = mytwo("prefix2", "city", "roots2")
> d69 = mytwo("prefix2", "cityst", "roots2")
> d70 = mytwo("prefix2", "inscompany", "roots2")
Three way between P/I/R
> d71 = mytwo("prefix", "inscompany", "suffix")
> d72 = mytwo("prefix", "inscompany", "suffix2")
> d73 = mytwo("prefix", "inscompany", "suffix3")
> d74 = mytwo("prefix2", "inscompany", "suffix")
> d75 = mytwo("prefix2", "inscompany", "suffix2")
> d76 = mytwo("prefix2", "inscompany", "suffix3")
This doesn't follow the pattern; it is just inscompany rather than all
of I (crossed with P and S). Is it just incomplete and should be all of
P/I/S?
How about:
lst <- list(roots = c("car insurance", "auto insurance"),
roots2 = c("insurance"), prefix = c("cheap", "budget"),
prefix2 = c("low cost"), suffix = c("quote", "quotes"),
suffix2 = c("rate", "rates"), suffix3 = c("comparison"),
state = c("state"), inscompany = c("inscompany"),
city=c("city"),
cityst = c("cityst"), agency=c("agency"))
my <- function(...) {
data.frame(keyword=c(do.call(paste, do.call(expand.grid, lst[c(...)]))))
}
setP <- c("prefix", "prefix2")
setI <- c("state", "city", "cityst", "inscompany")
setR <- c("roots", "roots2")
setS <- c("suffix", "suffix2", "suffix3")
d <- c(
mlply(expand.grid(setP, setR, setS, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setP, setR, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setR, setS, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setI, setR, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setI, setR, setS, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setP, setI, setR, stringsAsFactors = FALSE), my,
.expand=FALSE),
mlply(expand.grid(setP, "inscompany", setS, stringsAsFactors = FALSE),
my, .expand=FALSE)
)
Of course, you could keep going with the abstraction:
mymlply <- function(...) {
mlply(expand.grid(..., stringsAsFactors = FALSE), my, .expand=FALSE)
}
sets <- list(setP, setI, setR, setS)
d <- c(
mlply(t(combn(4,2)), function(...) {mymlply(sets[c(...)])}, .expand=FALSE),
mlply(t(combn(4,3)), function(...) {mymlply(sets[c(...)])}, .expand=FALSE)
)
d <- unlist(d, recursive=FALSE)
which gives all 2 or 3 selections of the 4 sets (120 data frames in
all), and expands them to all crosses, and looks up each of those in lst
and makes the dataframes that are the crosses.
Now, I don't know why you would want this this way, necessarily. I am
guessing the innermost dataframes should be of character, not factor.
But it is (close to) what you asked for.
>
> Obviously, this code gets rather repetative, even with the function, and I
> was
> wondering if there's a shortcut that I should consider to simplify the
> process.
>
> Thanks,
>
> I'm running R 2.13 on Ubuntu 10.10
>
> [[alternative HTML version deleted]]
>
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
More information about the R-help
mailing list