In the last 20-30 years we have seen a tremendous growth in processing power where the average speed of the processor has increased year by year.But over the last couple of years we are observing a shift in this trend.Instead of increase in processor speed the number of processors on a single chip are increasing.We are very fast moving from dual core to quad core to eight core processors.So we are gradually shifting towards a multicore processor architecture.But most of our present applications are still of sequential nature and not capable of leveraging the performance benefits offered by multicore processors.This is because parallel/concurrent programming involves threads,locks,mutexes etc. which are difficult to program and also hard to maintain because of lack of tools for profiling and debugging those applications.So we needed a set of libraries and tools which
- Makes concurrent programming easier where developers need not care much details of threads and locks.
- Makes profiling and debugging concurrent applications easier
- Enables the applications to scale up without any code as number of cores increases
- Allows existing applications to be easily parallelized.
Here comes Parallel Extensions in .NET 4.0 which provides a set of libraries and tools to achieve the above mentioned objectives.This supports two paradigms of parallel computing
- Data Parallelism – This refers to dividing the data across multiple processors for parallel execution.e.g we are processing an array of 1000 elements we can distribute the data between two processors say 500 each.This is supported by the Parallel LINQ (PLINQ) in .NET 4.0
- Task Parallelism – This breaks down the program into multiple tasks which can be parallelized and are executed on different processors.This is supported by Task Parallel Library (TPL) in .NET 4.0
A high level view of this framework is shown below:
The key components are:
- Coordinated Data Structures (CDS) – These are set of APIs added to the BCL which provides a set of thread safe collection classes, lightweight synchronization classes and lazy initializers.The concurrent collections are added in the new System.Collections.Concurrent namespace and others are added to the System.Threading namespace.These are cross cutting components used by the other three as shown in the figure above.
- Scheduler – This is a new Scheduler which is capable of querying the OS and get information about the number of processors, memory architecture of the processors (i.e UMA/NUMA) and accordingly schedule the tasks.This component is used by the Task Parallel Library for task scheduling.This is the component which makes the applications scalable with growing number of cores in the machine.
- Task Parallel Library (TPL) – These are set of APIs present in System.Threading and System.Threading.Task namespace which provides facilities for Task Parallelism.The System.Threading.Parallel class provides a method Invoke which can accept an array of delegates thus enabling parallel execution of multiple methods.The methods Parallel.For and Parallel.ForEach enable parallel execution of loops thus supporting data parallelism.These are used by PLINQ.
- Parallel LINQ (PLINQ) – PLINQ provides a parallel implementation of LINQ and supports all the standard query operators.This is achieved by the System.Linq.ParallelEnumerable class which provides a set of extension methods on IEnumerable interface.Developers can opt-in for parallelism by invoking the AsParallel method of ParallelEnumerable class on the data source.Using PLINQ we can combine sequential and parallel queries and it also supports ordered/unordered execution.