Hai Nguyen
2014-12-10 16:50:56 UTC
Hi there,
As far as I know, supporting GPU for Python is still limited and the
most popular library is pycuda/pyopencl. In order to use it we still
need to write extension in C/C++ syntax
(http://documen.tician.de/pycuda/)
I am interested in writing CUDA code in clean Cython/Python syntax. Is
there any quick way to get this code below workable?
( I know that we can write very clean syntax with numbapro but
numbapro is not really free).
Thanks.
(This is very simple hello world example for running in GPU from this website
https://www.pdc.kth.se/resources/computers/zorn/how-to/how-to-compile-and-run-a-simple-cuda-hello-world)
hello.pyx
======
# pseudocode
from libcuda cimport *
DEF N = 7
DEF blocksize = 7
# dummy cuda decorator to let Cython know that it will be used with CUDA
cdef cuda(func, message):
pass
@cuda('global')
cdef void hello (char* a, int *b, cudablockcode=args):
# C language: void __global__ hello(...)
# cudablockcode will be translated to something like <<<dimGrid, dimBlock>>>
# hello<<dimGrid, dimBlock>>>(ad, bd)
a[threadIdx.x] += b[threadIdx.x]
cdef main():
cdef char[N] a = "Hello "
cdef int[N] b = [15, 10, 6, 0, -11, 1, 0]
cdef:
char* ad
int* db
int csize = N * sizeof(char)
int isize = N * sizeof(int)
int i=1
dim3 dimBlock(blocksize, i)
dim3 dimGrid(i, i)
print a
# alloc and copy to device
cudaMalloc(<void**>&ad, csize)
cudaMalloc(<void**>&bd, isize)
cudaMemcpy(ad, a, csize, cudaMemcpyHostToDevice)
cudaMemcpy(bd, b, isize, cudaMemcpyHostToDevice)
# hello<<<dimGrid, dimBlock>>>(ad, bd)
# call function in device
hello(ad, bd, cudablockcode=(dimGrid, dimBlock))
# copy to host
cudaMemcpy(a, ad, csize, cudaMemcpyDeviceToHost)
# free mem
cudaFree(ad)
libcuda.pxd
========
# delaration for CUDA
cdef extern from "driver_types.h":
ctypedef enum cudaError:
pass
ctypedef enum cudaMemcpyKind:
cudaMemcpyHostToHost
cudaMemcpyHostToDevice
cudaMemcpyDeviceToHost
cudaMemcpyDeviceToDevice
cudaMemcpyDefault
cdef extern from "vector_types.h":
ctypedef struct uint3:
unsigned int x, y, z
ctypedef struct dim3:
pass
cdef extern from "cuda.h":
uint3 threadIdx
uint3 threadIdy
uint3 threadIdz
cdef extern from "device_functions.h":
void __syncthreads(void)
ctypedef cudaError cudaError_t
cdef extern from "cuda_device_runtime_api.h":
cudaError_t cudaMalloc(void **p, size_t s)
cudaError_t cudaMemcpy()
cdef extern from "cuda_runtime_api.h":
cudaError_t CUDARTAPI cudaMemcpy(void *dst, const void *src, size_t count,
enum cudaMemcpyKind kind)
cheer
Hai
As far as I know, supporting GPU for Python is still limited and the
most popular library is pycuda/pyopencl. In order to use it we still
need to write extension in C/C++ syntax
(http://documen.tician.de/pycuda/)
I am interested in writing CUDA code in clean Cython/Python syntax. Is
there any quick way to get this code below workable?
( I know that we can write very clean syntax with numbapro but
numbapro is not really free).
Thanks.
(This is very simple hello world example for running in GPU from this website
https://www.pdc.kth.se/resources/computers/zorn/how-to/how-to-compile-and-run-a-simple-cuda-hello-world)
hello.pyx
======
# pseudocode
from libcuda cimport *
DEF N = 7
DEF blocksize = 7
# dummy cuda decorator to let Cython know that it will be used with CUDA
cdef cuda(func, message):
pass
@cuda('global')
cdef void hello (char* a, int *b, cudablockcode=args):
# C language: void __global__ hello(...)
# cudablockcode will be translated to something like <<<dimGrid, dimBlock>>>
# hello<<dimGrid, dimBlock>>>(ad, bd)
a[threadIdx.x] += b[threadIdx.x]
cdef main():
cdef char[N] a = "Hello "
cdef int[N] b = [15, 10, 6, 0, -11, 1, 0]
cdef:
char* ad
int* db
int csize = N * sizeof(char)
int isize = N * sizeof(int)
int i=1
dim3 dimBlock(blocksize, i)
dim3 dimGrid(i, i)
print a
# alloc and copy to device
cudaMalloc(<void**>&ad, csize)
cudaMalloc(<void**>&bd, isize)
cudaMemcpy(ad, a, csize, cudaMemcpyHostToDevice)
cudaMemcpy(bd, b, isize, cudaMemcpyHostToDevice)
# hello<<<dimGrid, dimBlock>>>(ad, bd)
# call function in device
hello(ad, bd, cudablockcode=(dimGrid, dimBlock))
# copy to host
cudaMemcpy(a, ad, csize, cudaMemcpyDeviceToHost)
# free mem
cudaFree(ad)
libcuda.pxd
========
# delaration for CUDA
cdef extern from "driver_types.h":
ctypedef enum cudaError:
pass
ctypedef enum cudaMemcpyKind:
cudaMemcpyHostToHost
cudaMemcpyHostToDevice
cudaMemcpyDeviceToHost
cudaMemcpyDeviceToDevice
cudaMemcpyDefault
cdef extern from "vector_types.h":
ctypedef struct uint3:
unsigned int x, y, z
ctypedef struct dim3:
pass
cdef extern from "cuda.h":
uint3 threadIdx
uint3 threadIdy
uint3 threadIdz
cdef extern from "device_functions.h":
void __syncthreads(void)
ctypedef cudaError cudaError_t
cdef extern from "cuda_device_runtime_api.h":
cudaError_t cudaMalloc(void **p, size_t s)
cudaError_t cudaMemcpy()
cdef extern from "cuda_runtime_api.h":
cudaError_t CUDARTAPI cudaMemcpy(void *dst, const void *src, size_t count,
enum cudaMemcpyKind kind)
cheer
Hai
--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.