- GPU-accelerate NumPy ufuncs with a few lines of code.
- Configure code parallelization using the CUDA thread hierarchy.
- Write custom CUDA device kernels for maximum performance and flexibility.
- Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.
Prerequisites:
- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
- NumPy competency, including the use of ndarrays and ufuncs
- No previous knowledge of CUDA programming is required
Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.
Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.
Food coupons will be given to attendees which can be used at all Taste Imperial outlets for lunch.
The course is free of charge, but if you register and do not attend and/or cancel with insufficient notice, we will charge you as outlined in the POD website T&C. Please ensure you have a valid cost code for the registration process