[R-sig-hpc] NVidia bug in OpenCL ?

Sat Sep 5 03:07:56 CEST 2015

Hello there,

this is not strictly an R bug but I found it while using OpenCL package in R.

library(OpenCL)
help(oclRun)

provides an OpenCL example to run into R. The kernel proposed is a simple one computing dnorm, here it is verbatim :

__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
      / (sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) );
}

On most machine/devices this runs ok. But on some Mac with NVidia GPU (GT 755 and GT 750M) this causes huge difference between the standard dnorm and the OpenCL computed one (up to 1e+18). This is not the case on the intel-graphics of these same machine, nor on AMD graphics or even on Tesla under linux.

Here is the result of the example :
> f(1:10/2) - dnorm(1:10/2)
 [1] 4.600014e+18 4.597886e+18 4.593834e+18 4.587942e+18 4.580709e+18 4.571760e+18 4.561188e+18 4.549069e+18 4.535339e+18
[10] 4.519628e+18

Still on the faulty machines, if I run the fp64 version of the example all goes well.

After some investigation, I though of trying to get the full computation explicitly in fp32 (avoiding implicit conversion) : turning all constant to fp32 literals.

__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
      / (sigma * sqrt( 2.0f * 3.14159265358979323846264338327950288f ) );
}

This now works ok on the «faulty» machines.
To be sure the problem comes from some conversion at the later stage, I tried to perform and explicit conversion to either fp32 or fp64 before assignment :

__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = convert_float(exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
                               / (sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) ));
}

this works ok

__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = convert_double(exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
                               / (sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) ));
}
is NOT working (same level as error).

Further more if one performs an explicit fp32 conversion on the denominator of this one, it now works :
__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = convert_double(exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
                               / convert_float(sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) ));
}
while
__kernel void dnorm(
                    __global float* output,
                    const unsigned int count,
                    __global float* input,
                    const float mu, const float sigma)
{
  int i = get_global_id(0);
  if(i < count)
    output[i] = convert_double(exp(-0.5f * ((input[i] - mu) / sigma) * ((input[i] - mu) / sigma))
                               / convert_double(sigma * sqrt( 2 * 3.14159265358979323846264338327950288 ) ));
}
is dead …

I fail to understand the precise effect of the various conversions in the way the computation is done.

Still all of those would work ok on the other settings.

I guess to avoid someone else spend so much time again on this issue it would be helpful to :

* Use a version of the exemple not involving any type conversion (be-it implicit or explicit)
* Warn the user that type conversion on some architecture+driver might generate totally wrong results.

Hope this is not too much work and will help other that would face the same issue I had for a day or two.

Cheers,

Serge.

++++++++++++++++++++++++++++++++++++++++
Serge Cohen
GPG Key ID: 0B5CDAEC

IPANEMA USR 3461 CNRS/MCC

site Synchrotron SOLEIL
L'Orme des Merisiers
Saint-Aubin - BP48
91192 Gif-sur-Yvette cedex
FRANCE
++++++++++++++++++++++++++++++++++++++++

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20150905/27b83aa4/attachment.bin>