[R] working with environments to ensure code quality for long R scripts

Alexander juschitz_alexander at yahoo.de
Fri Apr 20 07:11:22 CEST 2012


In fact, I took over the code of someone else, but I agree with you. It is
really a mess and very hard to understand the code if you haven't programmed
it. So you would suggest to put everything in functions and therefore have
only very little return variables to work with. This also should limit the
probleme of managing a mess of variables...

I think I didn't really want to see the simple solution, but it is
inevitable ...

Thanks again!!

Alexander


Bert Gunter wrote
> 
> Comment (caveat emptor):
> 
> If I understand correctly, your difficulties all stem from your use of
> the word "script," which betrays a fundamental misunderstanding of the
> nature of R as a programming language.
> 
> R is based (mostly) on the concepts of functional programming. So
> instead of doing what as you have done -- spreading mishmosh of code
> around in a bunch of files --  all that code should be made into
> functions. These then can be organized formally into a package (with
> documentation, namespaces, etc.) or informally into a saved .Rdata
> file. In either case, the whole thing then functions as an organic
> whole.
> 
> By doing as you have done (are you a former SAS or maybe JAVA
> programmer?) you are contravening the programming strategy around
> which R is built, leaving you with a clumsy mess. This is not to say
> that you can't get it to work -- you probably can. Only that it's a
> mess.
> 
> Moral: Use R as it is intended to be used, not as you would like it to be
> used.
> 
> Cheers,
> Bert
> 
> On Thu, Apr 19, 2012 at 12:57 PM, Alexander <juschitz_alexander@>
> wrote:
>> Hi
>>
>> thank you for your suggestions, but I am not sure if I explained my
>> problem
>> well enough. Lets asume, that I have 30 different script files and 1
>> script
>> which calls these 30 scripts one after the other by "source". Some of the
>> 30
>> scripts only contain definitions of functions which are called in other
>> scripts, some only execute code, load, save, and interact with the user
>> via
>> tcl tk. Together they represent a big programm which asks from the user a
>> lot of input, treats and manipulates the input. At the end, the user
>> obtains
>> some results files.
>> As there are many script sources during the execution, there are a lot of
>> different variables initialized etc... Some are only of temporary need,
>> some
>> are necessary for later steps in other scripts.
>> Now the question: Is there any methodology to ensure the running ? For
>> example after every script I could save the whole workingspace into
>> script1.Rdata, delete all variable which are not needed in a later script
>> (perhabs a little bit difficult to manage that, but possible), and
>> continue
>> with the execution.
>>
>> I don't know if I was able to describe my problem more precisely.
>>
>> Alexander
>>
>> cberry wrote
>>>
>>> Alexander,
>>>
>>> If Tal's suggestion to use caching in Sweave doesn't appeal to you, you
>>> might look at  'R.cache' and other packages mentioned in
>>>
>>> http://cran.r-project.org/web/views/ReproducibleResearch.html
>>>
>>> under 'Caching of R Objects'.
>>>
>>> However, an advantage of the Sweave-like approaches is that you can
>>> generate a brief report that includes the versions of scripts used,
>>> summarizes the data processing, and gives intermediate results for later
>>> inspection and sanity checks.
>>>
>>> HTH,
>>>
>>> Chuck
>>>
>>> Tal Galili <tal.galili@> writes:
>>>
>>>> Hi Alexander,
>>>> Saving full environments is possible, but it is very easy to start
>>>> loosing
>>>> track on where each variable came from.
>>>> You might want to use this process:
>>>> http://www.r-bloggers.com/a-better-way-of-saving-and-loading-objects-in-r/
>>>> It depends on how many variables you work with, but it might help.
>>>>
>>>> Another way is to do all of the work through Sweave, and combine it
>>>> with
>>>> caching:  http://cran.r-project.org/web/packages/cacheSweave/index.html
>>>> This will ensure that every code chunk will keep the variables you
>>>> created,
>>>> without the need to re-run the code from scratch.
>>>>
>>>> For extracting data from outside sources, I would often use the first
>>>> method, and for analysis I would use the later option.
>>>>
>>>> Good luck,
>>>> Tal
>>>>
>>>>
>>>> ----------------Contact
>>>> Details:-------------------------------------------------------
>>>> Contact me: Tal.Galili@ |  972-52-7275845
>>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
>>>> |
>>>> www.r-statistics.com (English)
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 19, 2012 at 11:15 AM, Alexander
>>>> <juschitz_alexander@>wrote:
>>>>
>>>>> Hello, I am working under R2.11 Windows and currently I work on a big
>>>>> R
>>>>> progjet which executes different R script in a row. Every R script
>>>>> represents a module. As every module depends of the variables created
>>>>> in
>>>>> the
>>>>> modules previously executed, I want to be shure, that I don't create
>>>>> or
>>>>> change a variable in a scriptwithout being aware that this affects the
>>>>> results in a later executed script. Therefore, I was think to save all
>>>>> important variables to keep in a seperate "backup" environment.
>>>>> Everytime a
>>>>> script starts, it loads the variables of the "backup" environment in
>>>>> .GlobalEnv. At the end of the script, I want to add all new obtained
>>>>> variables to the "backup" environment (and check automaticaly if any
>>>>> variables of "backup"environment is going to be overwritten) and clean
>>>>> the
>>>>> workspace .GlobalEnv to start the next script neat and tidy. What do
>>>>> you
>>>>> think of this solution? Does anyone have better ideas or experience to
>>>>> share?
>>>>>
>>>>> Thank you in advance
>>>>>
>>>>> Alexander
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/working-with-environments-to-ensure-code-quality-for-long-R-scripts-tp4570195p4570195.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help@ mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>      [[alternative HTML version deleted]]
>>>>
>>>
>>> --
>>> Charles C. Berry                            Dept of Family/Preventive
>>> Medicine
>>> cberry at ucsd edu                        UC San Diego
>>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
>>> 92093-0901
>>>
>>> ______________________________________________
>>> R-help@ mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/working-with-environments-to-ensure-code-quality-for-long-R-scripts-tp4570195p4571989.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context: http://r.789695.n4.nabble.com/working-with-environments-to-ensure-code-quality-for-long-R-scripts-tp4570195p4572881.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list