[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Emmanuel Charpentier
charpent at bacbuc.dyndns.org
Fri Dec 7 00:00:21 CET 2007
Thomas Pujol a écrit :
> R-help users,
> Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did.
>
> I have data for many "samples". (e.g. 1950, 1951, 1952, etc.)
>
> For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.)
>
> (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.)
>
> I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data.
>
> I currently accomplish this by using a custom function that uses:
> "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ).
>
> Is there a better/preferred/more flexible way to do this?
Beware : what follows is the advice of someone used to use RDBMS and SQL
to work with data ; as anyone should know, everything is a nail to a man
with a hammer. Caveat emptor...
Unless I misunderstand you, you are trying to treat piecewise a large
dataset made of a large number of reasonably-sized independent chunks.
What you're trying to do seems to me a bit reinventing SAS macro
language. What's the point ?
IMNSHO, "large" datasets that are used only piecewise are much better
handled in a real database (RDBMS), queried at runtime via, for example,
Brian Ripley's RODBC.
In your example, I'd create a table births with all your data + the
relevant year. Out of the top of my mind :
# Do that ONCE in the lifetime of your data : a RDBMS is probably more
# apt than R dataframes for this kind of management
library(RODBC)
channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)
sqlSave(channel, tablename="Births",
rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),
births.1952),
cbind(data.frame(Year=rep(1953,nrow(births.1953))),
births.1953),
# ... ^W^Y ad nauseam ...
))
rm(births.1951, births.1952, ...) # get back breathing space
Beware : certain data types may be tricky to save ! I got bitten by
Dates recently... See RODBC documentation, your DBMS documentation and
the "R Data Import/Export guide"...
At analysis time, you may use the result of the relevant query exactly
as one of your dataframes. instead of :
foo(... data=birth.1952, ...)
type :
foo(... data=sqlQuery(channel,"select * from \"Births\" where
\"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS...
Furthermore, the variable "Year" bears your "d" information. Problem
(dis)solved.
You may loop (or even sapply()...) at will on d :
for(year in 1952:1978) {
query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year)
foo(... data=sqlQuery(channel,query), ...)
...
}
If you already use a DBMS with some connection to R (via RODBC or
otherwise), use that. If not, sqlite is a very lightweight library that
enables you to use a (very considerable) subset of SQL92 to manipulate
your data.
I understand that some people of this list have undertaken the creation
of a sqlite-based package dedicated to this kind of large data management.
HTH,
Emmanuel Charpentier
More information about the R-help
mailing list