Skip to main content

Compute Unified Device Architecture (CUDA)

Nvidia introduced their new general purpose parallel computing architecture called CUDA in year 2006. Which provides a new parallel programming model and instruction set architecture for Nvidia GPUs. Also it comes with a software environment that allows programmer to use C as high level programming language and solve computationally demanding problems in a more efficient way.

A hierarchy of thread groups, barrier synchronization and shared memories are the three key abstractions provided by CUDA that are simply exposed to the developer as a minimal set of language extensions. They give thread parallelism and fine-grained data parallelism, nested within task parallelism and coarse grained data parallelism. And also this abstractions helps the developer to partition the task into coarse subtasks which can be solved independently in parallel, and then into finer pieces that can be solved cooperatively in parallel.
With CUDA large numbers of processor cores can be used to transparently scale the programming model. That means, if the runtime system knows the physical processor count then CUDA program can run on any number of processors.

CUDA program contains serial program part called kernel. CUDA kernel represents the operations which are executed by a single thread and because of that it can execute via set of threads in parallel manner. Each tread is given a unique thread ID which is accessible within the kernel through the built-in variable called threadIdx. CUDA arrange this treads in to a hierarchy called blocks and grids. Block contains set of independent threads and grid contains set of independent thread blocks. As the thread ID, blocks inside a grid can be
identified using a built-in block ID variable called blockIdx.
During the execution CUDA threads may use several memory spaces. Each thread has its own private local memory and when considering about blocks each thread block has a shared memory space which is visible to all threads of the block. Also there is a global memory which is accessible to all threads. There are other two types of read-only memory spaces called constant memory and texture memory. Constant memory is consisting of very limited size and cache. Texture memory is large and cached. Also reading from texture memory is generally takes less amount time than reading from local or global memory. When considering about particular application global, constant and texture memory spaces are persistent across its kernels.

At the hardware level the Nvidia GeForce 8800 GTX processor can be considered as a collection of 16 multipro- cessors, with 8 scalar processor (SP) cores in each. Figure 1 shows general view of CUDA hardware interface. Each multiprocessor consists of its own shared memory and which is visible to all 8 processors inside. These multiprocessors also have set of 32-bit registers, texture and constant memory caches.

When managing hundreds of threads, multiprocessor maps each thread to one scalar processor core. Which employs a new architecture called single-instruction multiple-thread (SIMT) and it makes each scalar processor core a SIMT processor. Device memory is available to all the processors and it allows communicating between multiprocessors.

Reference :
  1. NVIDIA CUDA Programming Guide Version 2.3.1, 2009.


Popular posts from this blog

Beautiful Trincomalee : One of the stunning places to visit in Sri-Lanka

Recently my self and few of my friends traveled to the ancient city of Trncomalee which is situated in eastern province and lies on the east cost of the little island. In ancient times it was identified as Gokanna and it was one of the major see ports in the international trading history of Sri Lanka. The city is home to the famous ancient Koneswaram Temple and it is said that this is the capital city of king Ravana.
This was my first travel to this ancient city and I would say that, it is one of the best places to visit in Sri Lanka. There are lot of places to visit there including Konesvaram Temple, Dutch Fort, Nilaveli Beach, Marble Beach and Pigeon Island. Nilaveli beach is considered to be one of the best beaches in Sri Lanka and Knee deep shallow seas spreads out hundreds of meters towards the sea. That provides ideal and pure conditions for sun bathing. Some of the pictures taken(by me and my friends) trough out the tour is listed bellow as a evidence to the stunning place. A…

How to install IBM WebSphere MQ on Ubuntu (Linux)

Following are the steps to install IBM WebSphere MQ version 8 on Ubuntu 14.04.

1) Create a user account with name "mqm" in Ubuntu. This should basically create a user called "mqm" and usergroup called "mqm"

2) Login to "mqm" user account and proceed with next steps

3) Increase the open file limit for the user "mqm" to "10240" or higher value. For this open "/etc/security/limits.conf" file and set the values as bellow.

mqm       hard  nofile     10240
mqm       soft   nofile     10240

4) Increase the number of processes allowed for the user "mqm" to "4096" or higher value. For this open "/etc/security/limits.conf" file and set the values as bellow. You will need to edit this file as a sudo user.

mqm       hard  nproc      4096
mqm       soft   nproc      4096

5) Install "RPM" on Ubuntu if you already don't have it.

sudo apt-get install rpm  

6) Download the IBM MQ (you will get…

Creating a Simple Axis Service(.aar file) and Deploy it in WSO2 Application Server

In this post I am explaining how to Create a Simple Axis Service(.aar file) and Deploy it in WSO2 Application Server using a simple sample. And also at the end I am describing how to do the same thing with creation of a Jar Service.

Lets assume "sample-home" as our parent directory and inside that we can create following folder structure.
With this folder structure we can include our external libraries (jar files) inside lib folder and  the "services.xml" file inside "META-INF" folder. Following is the sample services.xml definition which I used with this sample creation. <servicename="HelloService"><Description>This is a sample service to explain simple aar service </Description><messageReceivers><messageReceivermep=""class="org.apache.axis2.rpc.receivers.RPCMessageReceiver"/></messageReceivers><parametername="ServiceClass"locked="fals…