[R-pkg-devel] New test in R-devel causes existing packages to fail: "Error: connections left open"

Uwe Ligges ligge@ @ending from @t@ti@tik@tu-dortmund@de
Mon Aug 27 16:47:46 CEST 2018


I still do not undertsand why you cannot stop scala and related 
connections at the end of each example. You can insert a comment that 
this is not needed if you have follow up tasks for scala.

Best,
Uwe

On 27.08.2018 07:14, David B. Dahl wrote:
> Henrik,
> 
> Thanks for the suggest.  Yes, definitely, I think your more nuanced
> test would be a big improvement.  The only wrinkle is that the
> connection is established *not* when the package is *loaded* but
> rather when the connection is *first needed* (using delayedAssign when
> the package is loaded).  That way, loading the package doesn't block
> the REPL for ~5 seconds while Scala and the JVM first start.
> 
> -- David
> 
> On Thu, Aug 23, 2018 at 11:19 PM Henrik Bengtsson
> <henrik.bengtsson using gmail.com> wrote:
>>
>> Does R CMD check --as-cran test for newly opened connections or any
>> open connections?  Could the check for stray connection in
>> examples/vignettes be:
>>
>> 1. Record what connections are open
>> 2. Attach the package
>> 3. Record what connections are open
>> 4. Run the example
>> 5. Assert that no *new* connections in addition to what's recorded in
>> Step 3 are open
>> 6. Unload the package
>> 7. Assert that no *new* connections in addition to what's recorded in
>> Step 1 are open
>>
>> Step 5 asserts that the code in the example does not leave stray
>> connections behind, and Step 7 that the package itself does not leave
>> stray connections behind.
>>
>> /Henrik
>> On Thu, Aug 23, 2018 at 1:25 PM David B. Dahl <dahl using stat.byu.edu> wrote:
>>>
>>> Oops, I accidentally did not "reply-all".... Here is my message:
>>>
>>> Thanks Uwe, Duncan, and Gabor for the response, advise, and flexibility.
>>>
>>> Regarding Uwe's suggestion:  "... there should be a function that
>>> creates the connction and one that closes the connection," I should
>>> clarify.  The rscala package does just that.  There is a function
>>> (named "scala") that creates the connection (using delayedAssign) and
>>> another the closes the function (namely an S3 close method).  The
>>> examples for the rscala package do this full open/close semantics,
>>> but...
>>>
>>> The problem comes when authors of another package, let's call it the
>>> "FooBar" package, want to implement an algorithm in Scala based on
>>> functionality provided by the rscala package.  Let's say they write a
>>> function called "neatAlgorithm" based on Scala.  Yes, the FooBar
>>> package author could require that, before the user calls the
>>> "neatAlgorithm" function, they first call a function to set up the
>>> connection (which itself would call the "rscala::scala" function) and
>>> then, after calling the "neatAlgorithm" function, they call a function
>>> to close the connection.
>>>
>>> But that is not very user friendly and exposes the user to
>>> implementation details of the algorithm.  The user of the FooBar
>>> package don't really care whether the "neatAlgorithm" is implemented
>>> in pure R, C++, Scala, or whatever, much like the users of the 'lm'
>>> function don't need to know the implementation details or do any setup
>>> before and after calling the function.
>>>
>>> The current approach is that the connection to Scala is transparent to
>>> the end user of a package.  Behind the scenes, the package author
>>> establish the connection once it is needed and the rscala package
>>> manages the connection and explicitly closes it when 1. the package is
>>> unloaded or 2. the R session ends.  This approach does not leave
>>> dangling connections  --- which I believe is the point of the new test
>>> --- yet my package is caught up in the test.
>>>
>>> I hope that this approach is still valid.  Perhaps the test could
>>> result in a warning (instead of an error) and CRAN could accept
>>> packages with such a warning.
>>>
>>> If not, a work-around is to have a \dontshow section in the examples
>>> that will close the connection (but leave the Scala process running)
>>> and then automatically reestablish the connection as needed.  This
>>> would not be very efficient but, as Duncan mentioned, it mostly only
>>> effects the package examples themselves.  Plus, it would not be too
>>> burdensome for package developers.
>>>
>>> Again, thanks for considering my situation.
>>>
>>> Best regards,
>>>
>>> -- David
>>>
>>> On Mon, Aug 20, 2018 at 11:11 PM Uwe Ligges
>>> <ligges using statistik.tu-dortmund.de> wrote:
>>>>
>>>> My advise:
>>>>
>>>> Apparently you want to have communication via sockets to scala.
>>>>
>>>> So there should be a function that creates the connction and one tha
>>>> closes the connection.
>>>> Comparable to starting some parallel cluster and stopping it again.
>>>>
>>>> In the meantime, you can allow for all sorts of communication.
>>>>
>>>> So that's fine.
>>>>
>>>> Then in your examples, simply design them to be standalone, i.e. in
>>>> *your* examples always start the connection and stop it again at the end
>>>> of one examples block, i.e. the exampels defined in one Rd file.
>>>>
>>>> Best,
>>>> Uwe Ligges
>>>>
>>>>
>>>>
>>>> On 20.08.2018 02:11, Duncan Murdoch wrote:
>>>>> On 19/08/2018 12:34 PM, Gábor Csárdi wrote:
>>>>>> Sorry, missed that these were examples, so, yeah, that's harder.  G.
>>>>>
>>>>> How about a function that checks if the connection is open before doing
>>>>> anything, and then at the end you close it if it wasn't already open?
>>>>> This will make all examples run slower on CRAN, but won't affect most
>>>>> users who are doing their own stuff as well as running examples.
>>>>>
>>>>> Or, how about the startup code for the package opens the connection?
>>>>>
>>>>> Or perhaps CRAN will respond to this thread with another suggestion.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>>> On Sun, Aug 19, 2018 at 6:32 PM Gábor Csárdi <csardi.gabor using gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> You could just create a function to close the connection and then
>>>>>>> people could call it at the end of their test suites. >>
>>>>>>> Gabor
>>>>>>> On Sun, Aug 19, 2018 at 6:22 PM David B. Dahl <dahl using stat.byu.edu> wrote:
>>>>>>>>
>>>>>>>> In preparing to submit an update of my package to CRAN, I found that
>>>>>>>> R-devel has a new test regarding "connects left open" that my packages
>>>>>>>> fail.  The new test appears to have been committed by Uwe Ligges in
>>>>>>>> revisions 74959 and 74964 on 2018-07-14 and 2018-07-15, respectively.
>>>>>>>> The commit message says, "check after each example whether open
>>>>>>>> connections exist, indicating e.g. file connections were left open or
>>>>>>>> parallel clusters still running."
>>>>>>>>
>>>>>>>> I am hoping for advice on how to pass "R CMD check --as-cran".  Or,
>>>>>>>> perhaps my situation will prompt a change to the test or, at least,
>>>>>>>> having it result in a warning instead of an error.
>>>>>>>>
>>>>>>>> Below I describe the situation.  My rscala package allows developers
>>>>>>>> to write R packages based on Scala (much like rJava and Rcpp for Java
>>>>>>>> and C++, respectively).  Scala runs as a separate process and
>>>>>>>> interprocess communication is implemented using socket connections.
>>>>>>>>
>>>>>>>> Suppose a package using rscala has functions that call Scala code.
>>>>>>>> (Such packages are 'bamboo', 'sdols', and 'shallot' on CRAN.)  The
>>>>>>>> first time a user executes an R function calling down into Scala, a
>>>>>>>> socket connect between Scala and R is established.  For the sake of
>>>>>>>> low latency, after the call to the function ends, the connection stays
>>>>>>>> open until the package is unloaded or the R session ends.  But, this
>>>>>>>> approach runs afoul of the new test mentioned above that appears to be
>>>>>>>> designed to catch connections that are *accidentally* left open.
>>>>>>>>
>>>>>>>> I definitely do not want to users of my packages 'bamboo', 'sdols',
>>>>>>>> and 'shallot' to have to think about managing connection between Scala
>>>>>>>> and R.  That's an implementation detail and uing the package should be
>>>>>>>> transparent for the user (who doesn't care about the implementation
>>>>>>>> details).
>>>>>>>>
>>>>>>>> On my end, I see two solutions:  1. I could try to reengineer my
>>>>>>>> approach --- establishing a new connection for every single call into
>>>>>>>> Scala --- although I am loath to do anything to increase the latency,
>>>>>>>> or 2. I could wrap all the examples in \donttest so that CRAN checks
>>>>>>>> are passed.
>>>>>>>>
>>>>>>>> Or, again, perhaps my situation will prompt a reevaluation of the
>>>>>>>> test.  Perhaps it could result in a warning (instead of an error) and
>>>>>>>> the CRAN maintainers would accept packages with such a warning.
>>>>>>>>
>>>>>>>> Any advise?  Thanks a lot!
>>>>>>>>
>>>>>>>> -- David
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-package-devel using r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-package-devel using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-package-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel



More information about the R-package-devel mailing list