[Bioc-devel] What is Bioconductor's position on allowing users to create files in the working directory without an explicit path definition in the filename

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Fri Mar 22 18:20:36 CET 2019


Hi Koustav,

 From a Build System point of view, we certainly don't want packages to 
write files to the user's home (or to the current directory, which might 
be different from the user's home) during 'R CMD build' and 'R CMD 
check'. (Note that there is actually no guarantee that the current 
directory is writable.) We also don't want these files to be persistent. 
Using tempfile() (or some other place under tempdir()) gives you 
everything you need: a place that is guaranteed to be writable, and 
where the files are automatically removed at the end of the session for 
you. Plus, tempfile() gives you a unique path at each call so guarantees 
that you won't clash with an existing file.

So regardless of where your 'Output.Filename' argument is pointing to by 
default (some place under the user's home, the current dir, etc...), you 
need to make sure that it is set to a place under tempdir() in your 
examples and vignettes.

Hope this helps,

H.

On 3/22/19 09:33, Koustav Pal wrote:
> In response to Vince’s comment:
>
> My package already has an implementation for checking if the file exists and then to explicitly require the user’s permission for removing the file
>
> In response to Lori’s comment:
>
> In my implementation of BiocFileCache, if a user provides a filename without an explicit path definition this particular file is created within the Cache directory and the user is notified regarding this, i.e. a warning is issued. Otherwise, if an explicit definition is provided, the file is created on location and a tracking symlink is placed in the Cache directory. The filenames and paths are hashed to create unique identifiers and the full path itself is stored as an additional metadata column.
>
> To locate the file, users are able to query the cache using another wrapper function I built. They can also list all files that are being tracked throughout the package.
>
> Even then, a few users have expressed distress regarding this implementation as this is a non-standard one. Standard in this context would typically be how we handle files in shell.
>
> Can you provide additional suggestions regarding procedures that may alleviate user confusion?
>
>
> Koustav Pal,
> Post-Doctoral Fellow in Genome Architecture,
> Computational Genomics Group,
> IFOM - The FIRC Institute of Molecular Oncology,
> Via Adamello 16,
> 20139 Milano, Italy.
> Phone: +393441130157
> E-mail: koustav.pal using ifom.eu
>
>
>
>> On 22 Mar 2019, at 17:02, Shepherd, Lori <Lori.Shepherd using RoswellPark.org> wrote:
>>
>> To chime and expand a bit on Vince's comments:
>>
>> I feel Bioconductor's position when accepting packages , with few exceptions,  is that nothing should be written or saved to a users directory without the expressed permission of the user for fear of overwriting a users own directory or files previous to the packages intended use.  As Vince explained
>>
>> For this reason we recommend that the defaults to all function and usage in man/vignettes/tests be written to the tempdir()/tempfile() options.  If the package documentation is clear,  than it should be known in practical use the user should specify a more permanent location for the file creation rather than a temporary location.
>>
>> If a file is suppose to persist, BiocFileCache is an option for monitoring and storing files and is becoming a more standard way of organizing files.    There is the idea of saving objects to the cache with a given "rname" that would be a unique identifier.  Using that identifier, your package or the users should be able to use bfcquery  to query the cache and retrieve the file path.  As Vince said, this should then be documented in your package.   Without thoroughly understanding the implementation of your package this might be of use to you.
>>
>> Less likely:  Depending on its implementation in your package, you may also find the bfcadd function has an option of  action = c("copy", "move", "asis")   which controls if the file is moved into the BiocFileCache default directory, copied from the location,  or left in the original location.
>>
>> Cheers,
>>
>> Lori Shepherd
>> Bioconductor Core Team
>> Roswell Park Cancer Institute
>> Department of Biostatistics & Bioinformatics
>> Elm & Carlton Streets
>> Buffalo, New York 14263
>> From: Bioc-devel <bioc-devel-bounces using r-project.org <mailto:bioc-devel-bounces using r-project.org>> on behalf of Vincent Carey <stvjc using channing.harvard.edu <mailto:stvjc using channing.harvard.edu>>
>> Sent: Friday, March 22, 2019 11:55:05 AM
>> To: Koustav Pal
>> Cc: bioc-devel; Ferrari Francesco
>> Subject: Re: [Bioc-devel] What is Bioconductor's position on allowing users to create files in the working directory without an explicit path definition in the filename
>>   
>> Guidelines on this topic do not seem to be present in our web
>> site; there is a link to Wickham's guide but I don't see that it
>> confronts the topic.  I will make some unofficial and possibly
>> wrong remarks.
>>
>> Suppose my function has to create a file "foo.txt".  If I do it
>> in the working folder, I might destroy a user's cherished file.
>> So I should check to see if the filename I need is already in use.
>> If it is, I need to do something graceful.
>>
>> That's a lot of complexity that may never actually be used.  Can
>> we avoid it completely?  Here are a few ways to avoid it:
>>
>> 1) Don't create files, just create objects and leave the serialization
>> task to the user.  You can provide helper functions and documentation but
>> the details of target location of the serialization are left to the user.
>>
>> 2) If you create a file, use R's tempfile/tempdir discipline to avoid
>> the need for checking for clobber.  If the content needs to persist the
>> user should direct this, again with helpers as needed.
>>
>> 3) If you create a file that should persist, use BiocFileCache as that
>> addresses the location problem and has an added benefit of obligatory
>> metadata binding.  This is an underused strategy and more pedagogy
>> is surely in order.  If the user "cannot find" what has been made, there
>> is a systematic approach available that involves querying the cache.  Your
>> documentation will supply all relevant details.
>>
>> On Fri, Mar 22, 2019 at 11:38 AM Koustav Pal <koustav.pal using ifom.eu <mailto:koustav.pal using ifom.eu>> wrote:
>>
>>> Hello,
>>>
>>> My package HiCBricks was submitted and accepted under the previous 3.8
>>> release of Bioconductor.
>>>
>>> At the time, during package review, my reviewer had expressed reservations
>>> towards my package creating
>>> files in the current working directory.
>>>
>>>
>>> [REQUIRED] CreateLego() creates HDF5 files in the current directory if no
>>> path is given in the Output.Filename argument. This may clutter the working
>>> directory and it would be better to have the files saved to a temporary
>>> file
>>> (or directory) using tempfile() (or tempdir()).
>>>
>>>
>>> This was with regards to the main output files that were being created by
>>> my package.
>>> I clarified the specific point in question with my reviewer.
>>>
>>>
>>> The idea behind this package is to create a HDF file for storing
>>> high-resolution Hi-C (can be as large as a user wants) data and keep it as
>>> a persistent copy which the user can access later without having to reload
>>> the file. Therefore, I am a bit averse towards creating a tempfile or
>>> tempdir. Using a temporary file would go against this idea and would
>>> probably result in the user not having access to the file later. I have
>>> incorporated a control statement which will issue a warning regarding file
>>> creation inside the current working directory. Is that ok?
>>>
>>>
>>> Finally, my reviewer suggested that I make use of the BiocFileCache
>>> package to create files.
>>>
>>>
>>> The changes so far look good. I understand that tempfile() isn't a great
>>> solution for your package, so may I recommend that you store your data
>>> using the BiocFileCache package
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_release_bioc_html_BiocFileCache.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=rpWx7oKOLbJhGbHaAI67VA2N8m0mEHJ_Y-lxG1IGztU&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_release_bioc_html_BiocFileCache.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=rpWx7oKOLbJhGbHaAI67VA2N8m0mEHJ_Y-lxG1IGztU&e=> <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_release_bioc_html_BiocFileCache.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=rpWx7oKOLbJhGbHaAI67VA2N8m0mEHJ_Y-lxG1IGztU&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_packages_release_bioc_html_BiocFileCache.html&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=rpWx7oKOLbJhGbHaAI67VA2N8m0mEHJ_Y-lxG1IGztU&e=>>
>>> as opposed to automatically saving the file in a local directory. Once this
>>> change is made, I should be able to accept the package.
>>>
>>> I interpreted this as the reviewer expressing reservation towards files
>>> being created in the
>>> current working directory without the user's explicit requirement.
>>> Therefore, I made a working
>>> implementation of BiocFileCache within my package, which works perfectly
>>> fine.
>>>
>>> Yet, users are now facing troubles when having to locate files that they
>>> may have created in the current
>>> working directory using the traditional method of var = “something.txt”,
>>> because these files were created in
>>> the BiocFileCache cache during file creation. All the confusion and issue
>>> stems from this being a non-traditional
>>> method of keeping track of files and folders.
>>>
>>> What is Bioconductor’s position regarding this issue?
>>>
>>> Can users create files using Bioconductor packages in the current working
>>> directory without an explicit path definition in the filename?
>>>
>>> Or did I misinterpret the reviewer’s position and this is only a
>>> requirement when the package is being build by the builder?
>>>
>>>
>>> Koustav Pal,
>>> Post-Doctoral Fellow in Genome Architecture,
>>> Computational Genomics Group,
>>> IFOM - The FIRC Institute of Molecular Oncology,
>>> Via Adamello 16,
>>> 20139 Milano, Italy.
>>> Phone: +393441130157
>>> E-mail: koustav.pal using ifom.eu <mailto:koustav.pal using ifom.eu>
>>>
>>>
>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel using r-project.org <mailto:Bioc-devel using r-project.org> mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=SKsPe8hVaoeCJlyDce5faFyEnsF8KWM4aa0kF-R1HrY&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=SKsPe8hVaoeCJlyDce5faFyEnsF8KWM4aa0kF-R1HrY&e=>
>>>
>> -- 
>> The information in this e-mail is intended only for th...{{dropped:19}}
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=-AP7IBX02PnaqdcZcRGY9IMqXSfW-3jV4K4g_kooK50&s=SKsPe8hVaoeCJlyDce5faFyEnsF8KWM4aa0kF-R1HrY&e=

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list