[Rd] stats::fft produces inconsistent results
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Thu Oct 21 22:46:29 CEST 2021
Nice!
On 10/21/21 4:26 PM, Dipterix Wang wrote:
> Thank you for such detailed and plain explanation. It is much clearer to
> me now w.r.t. the R internal memory management and how PROTECT should be
> used.
>
> Also after diving into the documentation of FFTW3 library, I think I
> found why the data was centered.
>
> https://www.fftw.org/fftw3_doc/Planner-Flags.html
> <https://www.fftw.org/fftw3_doc/Planner-Flags.html>
>
> Basically
> 1. FFTW3 modifies the input data by default
> 2. one has to initialize the data after planning fft (except for some
> special situations). This “subtle” detail is buried in their
> documentation and is very hard to debug once a mistake is made.
>
> The second one actually causes CRAN package fftwtools to produce
> inconsistent results on osx
> (https://github.com/krahim/fftwtools/issues/15
> <https://github.com/krahim/fftwtools/issues/15>)
>
> Best,
> Dipterix
>
>> On Oct 21, 2021, at 6:32 AM, GILLIBERT, Andre
>> <Andre.Gillibert using chu-rouen.fr <mailto:Andre.Gillibert using chu-rouen.fr>>
>> wrote:
>>
>> > Haha, thanks : ) I guess I will probably be grouchy too if seeing so many people making the same mistakes again and again. It just happened to be me.
>>
>> Fortunately, you did not get offensed. :)
>>
>> This is nice to have a large community of developers for R packages,
>> even if, sometimes, buggy packages are annoying R developers because
>> any small change in R may "break" them even though they were actually
>> broken from the begining.
>>
>> >Indeed, I found myself often confused about when to PROTECT and when not.
>>
>> A (relatively) quick explanation.
>> There are several “pools” of data objects that have different rules.
>> The most common “pool” is the pool of garbage collectable R objects,
>> that can be allocated with allocVector and is passed from R to C code
>> and vice versa. Another pool is the malloc/free pool, that works with
>> explicit allocation/deallocation. R does not modify the malloc/free
>> implementation in any way, and memory leaks may happen. Operating
>> systems may have other pools of memory (e.g. mmap'ed memory) that are
>> not handled by R either. There is also a transient storage
>> (R_alloc/vmaxset/vmaxget) that is automatically freed when returning
>> from C to R, and should be used for temporary storage but not for
>> objects returned to R code.
>>
>> The PROTECT system is needed for garbage collectable objects.
>> The garbage collector may trigger whenever a R internal function is
>> called. Typically, when some memory is internally allocated.
>> The garbage collector frees objects that are neither referenced
>> directly nor indirectly from R code and from the PROTECT stack.
>> The PROTECT stack is used by C code to make sure objects that are not
>> yet (or will never be) referenced by R code, are not destroyed when
>> the garbage collector runs.
>>
>> The functions allocating new R objects, such as allocVector(), but
>> also coerceVector(), duplicate(),return unprotected objects, that may
>> be destroyed the next time an internal R function is called, unless it
>> is explicitly PROTECT'ed before. Indeed, such objects would have no
>> reference from R code and so, would be deleted.
>>
>> The PROTECT stack must be balanced on a call from R to a C function.
>> There must be as many UNPROTECT'ions than PROTECT'ions.
>>
>> The typical C code PROTECTs any object allocated as soon as it is
>> allocated (e.g. call to allocVector or coerceVector). It UNPROTECTs
>> temporary objects to "free" them (the actual memory release may be
>> delayed to the next garbage collection). It UNPROTECTs the object it
>> returns to R code. Indeed, in pure C code, there will be no garbage
>> collection between the time the object is UNPROTECTed and the time R
>> grabs the object. You must be very careful if you are using C++,
>> because destructors must not call any R internal function that may
>> trigger a garbage collection.
>> The arguments to the C code, do not have to be PROTECT'ed, unless they
>> are re-allocated. For instance, it is frequent to call coerceVector or
>> arguments and re-assign them to the C variable that represents the
>> argument. The new object must be PROTECT'ed.
>>
>> Actually, you do not need to *directly* PROTECT all objects that are
>> allocated in the C function, but you must make sure that all objects
>> are *indirectly* PROTECT'ed. For instance, you may allocate a VECSXP
>> (a "list" in R) and fill the slots with newly allocated objects. You
>> only need to PROTECT the VECSXP, since its slots are indirectly protected.
>>
>> If you have any doubt, it is not a bug to over-PROTECT objects. It may
>> slightly slow down garbage collection and use space on the PROTECTion
>> stack, but that is rarely a big deal. You should only avoid that when
>> that would lead to thousands or millions of protections.
>>
>> As I said, the PROTECT stack must be balanced between the entry and
>> exit of the C code. This is not a problem for 99% of functions that
>> free all the memory they use internally except the object that is
>> returned. Sometimes, some "background" memory, hidden to R code, may
>> have to be allocated for more time. A call to R_PreserveObject
>> protects the object, even after the C code returns to R, until
>> R_ReleaseObject is called. Without an explicit call to
>> R_ReleaseObject, memory is leaked!
>>
>> There is another mechanism in R that must be known. If you call any R
>> function from C code, or any internal R function that may fail with an
>> error, or any internal R function that can be stopped by the user (see
>> R_CheckUserInterrupt), then, R may call a longjmp to exit all the C
>> code. This is very much incompatible with C++ exceptions or
>> constructors/destructors. Rcpp can avoid, to some extent, that problem.
>>
>> With C code, this means that some malloc'ed memory or allocated
>> resources (file descriptors, sockets, etc.) may be leaked unless
>> something is done to prevent that. All PROTECT'ed objects are
>> automatically unprotected, so there is no problem with memory leak of
>> garbage collectable objects. There is a R_UnwindProtect() mechanism to
>> free temporary resources (e.g. a socket you allocated) when a longjmp
>> is triggered. Non-memory resources (e.g. a socket) returned to R
>> should use theR_MakeExternalPtr() mechanism to make sure that, when
>> the memory is freed by the garbage collector, the resource is also freed.
>>
>> "Writing R extensions" contains more extensive documentation, but I
>> hope that my quick description of the system will make it easier to
>> understand the extensive documentation.
>>
>> --
>> Sincèrement
>> André GILLIBERT
>>
>> *De :*Dipterix Wang <dipterix.wang using gmail.com
>> <mailto:dipterix.wang using gmail.com>>
>> *Envoyé :*mercredi 20 octobre 2021 22:01
>> *À :*Martin Maechler <maechler using stat.math.ethz.ch
>> <mailto:maechler using stat.math.ethz.ch>>; GILLIBERT, Andre
>> <Andre.Gillibert using chu-rouen.fr
>> <mailto:Andre.Gillibert using chu-rouen.fr>>;bbolker using gmail.com
>> <mailto:bbolker using gmail.com>
>> *Cc :*r-devel using r-project.org <mailto:r-devel using r-project.org>
>> *Objet :*Re: [Rd] stats::fft produces inconsistent results
>>
>> ATTENTION:Cet e-mail provient d’une adresse mail extérieure au CHU de
>> Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes
>> à moins de connaître l'expéditeur et de savoir que le contenu est sûr.
>> En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse.
>> Merci de votre vigilance
>>
>> Wow, you guys are amazing!
>>
>> as part of its pipeline, ravetools::mvfftw computes the
>> mean of the
>> input vector **and then centers it to a mean of zero**
>> (intentionally or
>> accidentally?)
>>
>>
>>
>> because variables are passed to compiled code by reference
>> (someone
>> can feel free to correct my terminology), this means that
>> the original
>> vector in R now has a mean of zero
>>
>> I didn’t center the input vector in my code. The data was fed “as-is”
>> into FFTW3. My guess is FFTW3 internally center the data. It could be
>> that FFTW3 library behave differently on different platforms, which
>> could explain the reproducibility issue.
>>
>> /"Indeed, R vectors are passed "by reference" to C code,
>> but the semantic must be "by value", i.e. the C function
>> must NOT change the contents of the vector, except in very
>> specific cases.”/
>>
>> CRAN has already had fftw and fftwtools, the issue is the data I’m
>> targeting at are at GB-level, copying the vectors can be memory
>> inefficient or even use up memories. The strategy of ravetools is to
>> import signals from local files, fft, then directly write to disk. So
>> only one reference will be used and modifying in-place is on purpose.
>> In fact, and the fft functions I created are not intended to be used
>> directly by users.
>>
>> However, I should've been very cautious when using these functions.
>> This is my fault. I’ll check the whole package to make sure only one
>> reference is used or otherwise the vectors will be copied.
>>
>> /This can be tested by the MAYBE_REFERENCED() macro in
>> Rinternals.h./
>>
>> Nice to learn! I’ll add it to my code.
>>
>> Properly using R vectors in C code is tricky. You have to
>> understand.
>>
>> 1) When you are allowed or not to modify vectors
>>
>> 2) When to PROTECT()vectors
>>
>> 3) How the garbage collector works and when it can trigger
>> (answer : basically, when you call any internal R function)
>>
>> Chapter 5 of "Writing R Extensions" documentation is quite
>> extensive:
>>
>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C<https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C>
>>
>> Indeed, I found myself often confused about when to PROTECT and when not.
>>
>> ... but then ravetools is not even a CRAN package, so
>> why should you dare to use it for anything serious ?
>>
>> Haha, thanks : ) I guess I will probably be grouchy too if seeing so
>> many people making the same mistakes again and again. It just happened
>> to be me.
>> But it’s good to be rigorous. Sooner or later I'll have to face these
>> problems. It’s better to make mistakes before having made many.
>>
>> Thanks y’all!
>>
>> Best,
>> - Dipterix Wang
>>
>>
>>
>> On Oct 20, 2021, at 5:32 AM, Martin Maechler
>> <maechler using stat.math.ethz.ch<mailto:maechler using stat.math.ethz.ch>> wrote:
>>
>> Martin Maechler
>> on Wed, 20 Oct 2021 11:26:21 +0200 writes:
>>
>>
>> [............]
>>
>>
>> Thank you, André , that's very good.
>>
>>
>>
>> Just to state the obvious conclusion:
>>
>>
>>
>> If Ben's suggestion is correct (and André has explained *how*
>> that could happen) this would mean a
>> SEVERE BUG in package ravetools's mvfftw() function.
>>
>>
>>
>> and it would have been (yet another) case of gaining speed by
>> killing correctness...
>>
>>
>>
>> ... but then ravetools is not even a CRAN package, so why
>> should you dare to use it for anything serious ?
>>
>>
>>
>> ... yes, being grouchy ..
>>
>>
>> which I should rather not be.
>>
>> Dipterix Wang *did* say initially that he is currently
>> developing ravetools so it's very reasonabl this is not yet a
>> CRAN package..
>>
>> Best,
>> Martin
>
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
More information about the R-devel
mailing list