Skip to main content

Compute Unified Device Architecture (CUDA)

Nvidia introduced their new general purpose parallel computing architecture called CUDA in year 2006. Which provides a new parallel programming model and instruction set architecture for Nvidia GPUs. Also it comes with a software environment that allows programmer to use C as high level programming language and solve computationally demanding problems in a more efficient way.

A hierarchy of thread groups, barrier synchronization and shared memories are the three key abstractions provided by CUDA that are simply exposed to the developer as a minimal set of language extensions. They give thread parallelism and fine-grained data parallelism, nested within task parallelism and coarse grained data parallelism. And also this abstractions helps the developer to partition the task into coarse subtasks which can be solved independently in parallel, and then into finer pieces that can be solved cooperatively in parallel.
With CUDA large numbers of processor cores can be used to transparently scale the programming model. That means, if the runtime system knows the physical processor count then CUDA program can run on any number of processors.

CUDA program contains serial program part called kernel. CUDA kernel represents the operations which are executed by a single thread and because of that it can execute via set of threads in parallel manner. Each tread is given a unique thread ID which is accessible within the kernel through the built-in variable called threadIdx. CUDA arrange this treads in to a hierarchy called blocks and grids. Block contains set of independent threads and grid contains set of independent thread blocks. As the thread ID, blocks inside a grid can be
identified using a built-in block ID variable called blockIdx.
During the execution CUDA threads may use several memory spaces. Each thread has its own private local memory and when considering about blocks each thread block has a shared memory space which is visible to all threads of the block. Also there is a global memory which is accessible to all threads. There are other two types of read-only memory spaces called constant memory and texture memory. Constant memory is consisting of very limited size and cache. Texture memory is large and cached. Also reading from texture memory is generally takes less amount time than reading from local or global memory. When considering about particular application global, constant and texture memory spaces are persistent across its kernels.

At the hardware level the Nvidia GeForce 8800 GTX processor can be considered as a collection of 16 multipro- cessors, with 8 scalar processor (SP) cores in each. Figure 1 shows general view of CUDA hardware interface. Each multiprocessor consists of its own shared memory and which is visible to all 8 processors inside. These multiprocessors also have set of 32-bit registers, texture and constant memory caches.




When managing hundreds of threads, multiprocessor maps each thread to one scalar processor core. Which employs a new architecture called single-instruction multiple-thread (SIMT) and it makes each scalar processor core a SIMT processor. Device memory is available to all the processors and it allows communicating between multiprocessors.

Reference :
  1. NVIDIA CUDA Programming Guide Version 2.3.1, 2009.

Comments

Popular posts from this blog

How to install IBM WebSphere MQ on Ubuntu (Linux)

Following are the steps to install IBM WebSphere MQ version 8 on Ubuntu 14.04. 1) Create a user account with name "mqm" in Ubuntu. This should basically create a user called "mqm" and usergroup called "mqm" 2) Login to "mqm" user account and proceed with next steps 3) Increase the open file limit for the user "mqm" to "10240" or higher value. For this open "/etc/security/limits.conf" file and set the values as bellow. mqm       hard  nofile     10240 mqm       soft   nofile     10240 4) Increase the number of processes allowed for the user "mqm" to "4096" or higher value. For this open "/etc/security/limits.conf" file and set the values as bellow. You will need to edit this file as a sudo user. mqm       hard  nproc      4096 mqm       soft   nproc      4096 5) Install "RPM" on Ubuntu if you already don't have it. sudo apt-get install rpm   6) Download

Creating a Simple Axis Service(.aar file) and Deploy it in WSO2 Application Server

In this post I am explaining how to Create a Simple Axis Service(.aar file) and Deploy it in WSO2 Application Server using a simple sample. And also at the end I am describing how to do the same thing with creation of a Jar Service. Lets assume "sample-home" as our parent directory and inside that we can create following folder structure. With this folder structure we can include our external libraries (jar files) inside lib folder and  the "services.xml" file inside "META-INF" folder. Following is the sample services.xml definition which I used with this sample creation. < service name = "HelloService" > < Description > This is a sample service to explain simple aar service </ Description > < messageReceivers > < messageReceiver mep = "http://www.w3.org/2006/01/wsdl/in-out" class = "org.apache.axis2.rpc.receivers.RPCMessageReceiver" /> </ messageReceivers > < parame

How to use Dynamic Registry Keys with WSO2 ESB Mediators

From this post I will going to briefly introduce about one of the new features provided by WSO2 ESB . Earlier WSO2 ESB supported static registry keys where users can select a key for the mediator as a static value. But from ESB 4.0.0 users can use dynamic registry keys where users can define XPath expression to evaluate the registry key in run time. For an example let's consider XSLT mediator. With earlier static registry key based method user have to define only single XSLT file for transformation. With the use of dynamic registry key, user will be able to use XPath expressions to dynamically generate the registry key, based on the message context. So with that user can have multiple XSLT files and based on the evaluated key mediator will be able to select the required XSLT file in run time. Following is a sample usage of static and dynamic registry keys and user can use both of them according to situation.   Static Registry Key – define the exact path to find the xslt fil