[cython-users] Cython syntax for CUDA programming?
Hai Nguyen
2014-12-10 16:50:56 UTC
Hi there,

As far as I know, supporting GPU for Python is still limited and the
most popular library is pycuda/pyopencl. In order to use it we still
need to write extension in C/C++ syntax

I am interested in writing CUDA code in clean Cython/Python syntax. Is
there any quick way to get this code below workable?
( I know that we can write very clean syntax with numbapro but
numbapro is not really free).


(This is very simple hello world example for running in GPU from this website

# pseudocode
from libcuda cimport *

DEF N = 7
DEF blocksize = 7

# dummy cuda decorator to let Cython know that it will be used with CUDA
cdef cuda(func, message):

cdef void hello (char* a, int *b, cudablockcode=args):
# C language: void __global__ hello(...)
# cudablockcode will be translated to something like <<<dimGrid, dimBlock>>>
# hello<<dimGrid, dimBlock>>>(ad, bd)
a[threadIdx.x] += b[threadIdx.x]

cdef main():
cdef char[N] a = "Hello "
cdef int[N] b = [15, 10, 6, 0, -11, 1, 0]
char* ad
int* db
int csize = N * sizeof(char)
int isize = N * sizeof(int)
int i=1
dim3 dimBlock(blocksize, i)
dim3 dimGrid(i, i)

print a
# alloc and copy to device
cudaMalloc(<void**>&ad, csize)
cudaMalloc(<void**>&bd, isize)
cudaMemcpy(ad, a, csize, cudaMemcpyHostToDevice)
cudaMemcpy(bd, b, isize, cudaMemcpyHostToDevice)

# hello<<<dimGrid, dimBlock>>>(ad, bd)
# call function in device
hello(ad, bd, cudablockcode=(dimGrid, dimBlock))

# copy to host
cudaMemcpy(a, ad, csize, cudaMemcpyDeviceToHost)

# free mem

# delaration for CUDA
cdef extern from "driver_types.h":
ctypedef enum cudaError:
ctypedef enum cudaMemcpyKind:

cdef extern from "vector_types.h":
ctypedef struct uint3:
unsigned int x, y, z
ctypedef struct dim3:

cdef extern from "cuda.h":
uint3 threadIdx
uint3 threadIdy
uint3 threadIdz

cdef extern from "device_functions.h":
void __syncthreads(void)

ctypedef cudaError cudaError_t
cdef extern from "cuda_device_runtime_api.h":
cudaError_t cudaMalloc(void **p, size_t s)
cudaError_t cudaMemcpy()

cdef extern from "cuda_runtime_api.h":
cudaError_t CUDARTAPI cudaMemcpy(void *dst, const void *src, size_t count,
enum cudaMemcpyKind kind)

You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sturla Molden
2014-12-10 19:07:31 UTC
Post by Hai Nguyen
I am interested in writing CUDA code in clean Cython/Python syntax. Is
there any quick way to get this code below workable?
I believe there is support for CUDA in Numba (or at least in Anaconda

You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hai Nguyen
2014-12-10 19:11:00 UTC
yes, I just wrote "(I know that we can write very clean syntax with numbapro but
numbapro is not really free)."


Post by Sturla Molden
Post by Hai Nguyen
I am interested in writing CUDA code in clean Cython/Python syntax. Is
there any quick way to get this code below workable?
I believe there is support for CUDA in Numba (or at least in Anaconda
You received this message because you are subscribed to the Google Groups "cython-users" group.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jérôme Kieffer
2014-12-10 19:29:22 UTC
Hello Hai,

I am regular PyOpenCL programmer (and also Cython) and I don't see the point in what you are suggesting:

Cuda/OpenCL imply many threads running simultaneously per chunk of block/workgroup.
The redesign of the algorithm is much more work than just a language issue.
By the way, this pays as OpenCL is slighly but regularly faster than Cython+gcc (thanks to JIT and advanced SIMD optimization)

Also, having OpenCL code in C and in a separate file is much cleaner to my opinion than mixing ala cuda.

Jérôme Kieffer <***@terre-adelie.org>
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.