[R] Grep with wildcards across multiple columns
arun
smartpink111 at yahoo.com
Thu Mar 14 23:19:24 CET 2013
HI,
Not sure whether this helps.
If you take out the grep(",par.obj,..), it works without any warning.
eval(parse(text=paste(
"dt2 <- dt[", "grep('", par.fund, "', fund) & ",
"grep('", par.func, "', func)",
", sum(amount), by=c('code', 'year')]" , sep="")))
dt[grep('^1.E$',fund) & grep('^1.....$',func),sum(amount),by=c('code','year')]
# code year V1
#1: 1001 2011 185482
#2: 1001 2012 189367
#3: 1002 2011 238098
#4: 1002 2012 211499
aggregate(amount~code+year,data=df,sum)
# code year amount
#1 1001 2011 185482
#2 1002 2011 238098
#3 1001 2012 189367
#4 1002 2012 211499
In the df, you provided, there is only value of obj.
levels(df$obj)
#[1] "100"
A.K.
----- Original Message -----
From: "Bush, Daniel P. DPI" <Daniel.Bush at dpi.wi.gov>
To: "'r-help at r-project.org'" <r-help at r-project.org>
Cc:
Sent: Thursday, March 14, 2013 5:43 PM
Subject: [R] Grep with wildcards across multiple columns
I have a fairly large data set with six variables set up like the following dummy:
# Create fake data
df <- data.frame(code = c(rep(1001, 8), rep(1002, 8)),
year = rep(c(rep(2011, 4), rep(2012, 4)), 2),
fund = rep(c("10E", "10E", "10E", "27E"), 4),
func = rep(c("110000", "122000", "214000", "158000"), 4),
obj = rep("100", 16),
amount = round(rnorm(16, 50000, 10000)))
What I would like to do is sum the amount variable by code and year, filtering rows using different wildcard searches in each of three columns: "1?E" in fund, "1??????" in func, and "???" in obj. I'm OK turning these into regular expressions:
# Set parameters
par.fund <- "10E"; par.func <- "100000"; par.obj <- "000"
par.fund <- glob2rx(gsub("0", "?", par.fund))
par.func <- glob2rx(gsub("0", "?", par.func))
par.obj <- glob2rx(gsub("0", "?", par.obj))
The problem occurs when I try to apply multiple greps across columns. I'd prefer to use data.table since it's so much faster than plyr and I have 159 different sets of parameters to run through, but I get the same error setting it up either way:
# Doesn't work
library(data.table)
dt <- data.table(df)
eval(parse(text=paste(
"dt2 <- dt[", "grep('", par.fund, "', fund) & ",
"grep('", par.func, "', func) & grep('", par.obj, "', obj)",
", sum(amount), by=c('code', 'year')]" , sep="")))
# Warning message:
# In grep("^1.E$", fund) & grep("^1.....$", func) :
# longer object length is not a multiple of shorter object length
# Also doesn't work
library(plyr)
eval(parse(text=paste(
"df2 <- ddply(df[grep('", par.fund, "', df$fund) & ",
"grep('", par.func, "', df$func) & grep('", par.obj, "', df$obj), ]",
", .(code, year), summarize, amount = sum(amount))" , sep="")))
# Warning message:
# In grep("^1.E$", df$fund) & grep("^1.....$", df$func) :
# longer object length is not a multiple of shorter object length
Clearly, the problem is how I'm trying to combine greps in subsetting rows, but I haven't been able to find a solution that works. Any thoughts-preferably something that works with data.table?
DB
Daniel Bush
School Finance Consultant
School Financial Services
Wisconsin Department of Public Instruction
PO Box 7841 | Madison, WI 53707-7841
daniel.bush -at- dpi.wi.gov | sfs.dpi.wi.gov
Ph: 608-267-9212 | Fax: 608-266-2840
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list